Decommissioning a datacenter
Steps to properly remove a datacenter so no information is lost.
Procedure
-
Make sure no clients are still writing to any nodes in the datacenter.
When not using OpsCenter, the following JMX MBeans provide details on client connections and pending requests:
- Active connections:
org.apache.cassandra.metrics/Client/connectedNativeClients
andorg.apache.cassandra.metrics/Client/connectedThriftClients
- Pending requests:
org.apache.cassandra.metrics/ClientRequests/viewPendingMutations
or usenodetool tpstats
.
- Active connections:
-
Run a full repair with nodetool repair --full
to ensure that all data is propagated from the datacenter being decommissioned.
- Shutdown the OpsCenter Repair Service if in use.
- Change all keyspaces so they no longer reference the datacenter being removed.
- Shutdown all nodes in the datacenter.
- Stop the DataStax Agent on each node if in use.
-
Run nodetool assassinate on every node
in the datacenter being removed:
nodetool assassinate remote_IP_address
If the RF (replication factor) on any keyspace has not been properly updated:- Note the name of the keyspace that needs to be updated.
- Remove the datacenter from the keyspace RF (using ALTER KEYSPACE).
- If the keyspace had RF simple strategy also run a full repair on the
keyspace:
nodetool repair --full keyspace_name
-
Run
nodetool status
to ensure that the nodes in the datacenter were removed.
Example
Removing DC3 from the cluster:
- Check the status of the cluster:
nodetool status
Status shows that there are three datacenters with 1 node in each:Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.11 474.23 KiB ? 7297d21e-a04e-4bb1-91d9-8149b03fb60a -9223372036854775808 rack1 Datacenter: DC2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.113 518.36 KiB ? 2ff7d46c-f084-477e-aa53-0f4791c71dbc -9223372036854775798 rack1 Datacenter: DC3 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.111 461.56 KiB ? ac43e602-ef09-4d0d-a455-3311f444198c -9223372036854775788 rack1
- Run a full repair:
nodetool repair --full
- Using JConsole, check the following JMX Beans to make sure there are no
active connections:
org.apache.cassandra.metrics/Client/connectedNativeClients
org.apache.cassandra.metrics/Client/connectedThriftClients
- Verify that there are no pending write requests on each node that is being
removed (The
Pending
column should read0
orN/A
):nodetool tpstats
Pool Name Active Pending (w/Backpressure) Delayed Completed... BackgroundIoStage 0 0 (N/A) N/A 640... CompactionExecutor 0 0 (N/A) N/A 1039... GossipStage 0 0 (N/A) N/A 4580... HintsDispatcher 0 0 (N/A) N/A 2...
- Start
cqlsh
and remove DC3 from all keyspace configurations. Repeat for each keyspace that has a RF set for DC3:alter keyspace cycling WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1':1,'DC2':2};
- Shutdown the OpsCenter Repair Service if in use.
- Shutdown all nodes in the datacenter.
- Stop the DataStax Agent on each node if in use.
- Run nodetool assassinate on each node in the DC3 (datacenter that is being
removed):
nodetool assassinate remote_IP_address
- In a remaining datacenter verify that the DC3 has been
removed:
nodetool status
Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.11 503.54 KiB ? 7297d21e-a04e-4bb1-91d9-8149b03fb60a -9223372036854775808 rack1 Datacenter: DC2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.113 522.47 KiB ? 2ff7d46c-f084-477e-aa53-0f4791c71dbc -9223372036854775798 rack1