Decommissioning a datacenter
Steps to properly remove a datacenter so no information is lost.
DSE OpsCenter provides connection and activity monitoring and allows you to run full repairs.
Procedure
-
Make sure no clients are still writing to any nodes in the datacenter.
When not using OpsCenter, the following JMX MBeans provide details on client connections and pending requests:
-
Active connections:
org.apache.cassandra.metrics/Client/connectedNativeClients
andorg.apache.cassandra.metrics/Client/connectedThriftClients
-
Pending requests:
org.apache.cassandra.metrics/ClientRequests/viewPendingMutations
or usenodetool tpstats
.
-
-
Run a full repair with
nodetool repair --full
or use OpsCenter Starting a repair service.This ensures that all data is propagated from the datacenter being decommissioned.
If using OpsCenter ensure that the repair has completed, see Checking the repair progress.
-
Shutdown the OpsCenter Repair Service if in use.
-
Change all keyspaces so they no longer reference the datacenter being removed.
-
Shutdown all nodes in the datacenter.
-
Stop the DataStax Agent on each node if in use.
-
Run
nodetool assassinate
on every node in the datacenter being removed:nodetool assassinate remote_IP_address
If the RF (replication factor) on any keyspace has not been properly updated:
-
Note the name of the keyspace that needs to be updated.
-
Remove the datacenter from the keyspace RF (using
ALTER KEYSPACE
). -
If the keyspace had RF simple strategy also run a full repair on the keyspace:
nodetool repair --full keyspace_name
-
-
Run
nodetool status
to ensure that the nodes in the datacenter were removed.
Example
Removing DC3 from the cluster:
-
Check the status of the cluster:
nodetool status
Status shows that there are three datacenters with 1 node in each:
Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.11 474.23 KiB ? 7297d21e-a04e-4bb1-91d9-8149b03fb60a -9223372036854775808 rack1 Datacenter: DC2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.113 518.36 KiB ? 2ff7d46c-f084-477e-aa53-0f4791c71dbc -9223372036854775798 rack1 Datacenter: DC3 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.111 461.56 KiB ? ac43e602-ef09-4d0d-a455-3311f444198c -9223372036854775788 rack1
-
Run a full repair:
nodetool repair --full
-
Using JConsole, check the following JMX Beans to make sure there are no active connections:
-
org.apache.cassandra.metrics/Client/connectedNativeClients
-
org.apache.cassandra.metrics/Client/connectedThriftClients
-
-
Verify that there are no pending write requests on each node that is being removed (The
Pending
column should read0
orN/A
):nodetool tpstats
Pool Name Active Pending (w/Backpressure) Delayed Completed... BackgroundIoStage 0 0 (N/A) N/A 640... CompactionExecutor 0 0 (N/A) N/A 1039... GossipStage 0 0 (N/A) N/A 4580... HintsDispatcher 0 0 (N/A) N/A 2...
-
Start
cqlsh
and remove DC3 from all keyspace configurations. Repeat for each keyspace that has a RF set for DC3:ALTER KEYSPACE cycling WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'DC1':1,'DC2':2};
-
Shutdown the OpsCenter Repair Service if in use.
-
Shutdown all nodes in the datacenter.
-
Stop the DataStax Agent on each node if in use.
-
Run
nodetool assassinate
on each node in the DC3 (datacenter that is being removed):nodetool assassinate remote_IP_address
-
In a remaining datacenter verify that the DC3 has been removed:
nodetool status
Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.11 503.54 KiB ? 7297d21e-a04e-4bb1-91d9-8149b03fb60a -9223372036854775808 rack1 Datacenter: DC2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.113 522.47 KiB ? 2ff7d46c-f084-477e-aa53-0f4791c71dbc -9223372036854775798 rack1