Decommissioning a datacenter

Steps to properly remove a datacenter so no information is lost.

Steps to properly remove a datacenter so no information is lost.

Procedure

  1. Make sure no clients are still writing to any nodes in the datacenter.
    The following JMX MBeans provide details on client connections and pending requests:
    • Active connections: org.apache.cassandra.metrics/Client/connectedNativeClients and org.apache.cassandra.metrics/Client/connectedThriftClients
    • Pending requests: org.apache.cassandra.metrics/ClientRequests/viewPendingMutations or use nodetool tpstats.
  2. Run a full repair with nodetool repair --fullfrom installation_location/bin/.

    This ensures that all data is propagated from the datacenter being decommissioned.

  3. Change all keyspaces so they no longer reference the datacenter being removed.
  4. Shutdown all nodes in the datacenter.
  5. Run nodetool assassinate on every node in the datacenter being removed:
    installation_location/bin/nodetool assassinate remote_IP_address
    If the RF (replication factor) on any keyspace has not been properly updated:
    1. Note the name of the keyspace that needs to be updated.
    2. Remove the datacenter from the keyspace RF (using ALTER KEYSPACE).
    3. If the keyspace had RF simple strategy also run a full repair on the keyspace:
      installation_location/bin/nodetool repair --full keyspace_name
  6. Run nodetool status to ensure that the nodes in the datacenter were removed.

Example

Removing DC3 from the cluster:
  1. Check the status of the cluster:
    installation_location/bin/nodetool status
    Status shows that there are three datacenters with 1 node in each:
    Datacenter: DC1
    ===============
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address         Load       Owns    Host ID                               Token                     Rack
    UN  10.200.175.11   474.23 KiB  ?       7297d21e-a04e-4bb1-91d9-8149b03fb60a  -9223372036854775808     rack1
    Datacenter: DC2
    ===============
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address         Load       Owns    Host ID                               Token                     Rack
    UN  10.200.175.113  518.36 KiB  ?       2ff7d46c-f084-477e-aa53-0f4791c71dbc  -9223372036854775798     rack1
    Datacenter: DC3
    ===============
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address         Load       Owns    Host ID                               Token                     Rack
    UN  10.200.175.111  461.56 KiB  ?       ac43e602-ef09-4d0d-a455-3311f444198c  -9223372036854775788     rack1
  2. Run a full repair:
    installation_location/bin/nodetool repair --full
  3. Using JConsole, check the following JMX Beans to make sure there are no active connections:
    • org.apache.cassandra.metrics/Client/connectedNativeClients
    • org.apache.cassandra.metrics/Client/connectedThriftClients
  4. Verify that there are no pending write requests on each node that is being removed (The Pending column should read 0 or N/A):
    installation_location/bin/tpstats
    Pool Name                                     Active      Pending (w/Backpressure)   Delayed      Completed...
    BackgroundIoStage                                  0                       0 (N/A)       N/A            640...
    CompactionExecutor                                 0                       0 (N/A)       N/A           1039...
    GossipStage                                        0                       0 (N/A)       N/A           4580...
    HintsDispatcher                                    0                       0 (N/A)       N/A              2...
    
  5. Start cqlsh and remove DC3 from all keyspace configurations. Repeat for each keyspace that has a RF set for DC3:
    ALTER KEYSPACE cycling WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'DC1':1,'DC2':2};
  6. Shutdown all nodes in the datacenter.
  7. Run nodetool assassinate on each node in the DC3 (datacenter that is being removed):
    installation_location/nodetool assassinate remote_IP_address
  8. In a remaining datacenter verify that the DC3 has been removed:
    nodetool status
    Datacenter: DC1
    ===============
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address         Load       Owns    Host ID                               Token                     Rack
    UN  10.200.175.11   503.54 KiB  ?       7297d21e-a04e-4bb1-91d9-8149b03fb60a  -9223372036854775808     rack1
    Datacenter: DC2
    ===============
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    --  Address         Load       Owns    Host ID                               Token                     Rack
    UN  10.200.175.113  522.47 KiB  ?       2ff7d46c-f084-477e-aa53-0f4791c71dbc  -9223372036854775798     rack1