Decommissioning a datacenter
Steps to properly remove a datacenter so no information is lost.
To decommision a DSE datacenter:
Procedure
-
Make sure no clients are still writing to any nodes in the datacenter.
When using OpsCenter, you can verify this using the
Write Requests
chart, which displays thewrite-ops
metrics. Check this metric for each node to verify that no clients are writing data to the datacenter.When not using OpsCenter, the following JMX MBeans provide details on client connections and pending requests:
-
Active connections:
org.apache.cassandra.metrics/Client/connectedNativeClients
andorg.apache.cassandra.metrics/Client/connectedThriftClients
-
Pending requests:
org.apache.cassandra.metrics/ClientRequests/viewPendingMutations
or usenodetool tpstats
.
-
-
Run a full repair with nodetool repair --full to ensure that all data is propagated from the datacenter being decommissioned.
You can also use the OpsCenter Repair Service.
If using OpsCenter ensure that the repair has completed, see Checking the repair progress.
-
Shutdown the OpsCenter Repair Service if in use.
-
Change all keyspaces so they no longer reference the datacenter being removed.
-
Shutdown all nodes in the datacenter.
-
Stop the DataStax Agent on each node if in use.
-
Run nodetool assassinate on every node in the datacenter being removed:
nodetool assassinate <remote_IP_address>
If the RF (replication factor) on any keyspace has not been properly updated:
-
Note the name of the keyspace that needs to be updated.
-
Remove the datacenter from the keyspace RF (using ALTER KEYSPACE).
-
If the keyspace had RF simple strategy also run a full repair on the keyspace:
nodetool repair --full <keyspace_name>
-
-
Run
nodetool status
to ensure that the nodes in the datacenter were removed. -
If the OpsCenter Repair service was disabled, re-enable it now.
Example
Removing DC3 from the cluster:
-
Check the status of the cluster:
nodetool status
Status shows that there are three datacenters with 1 node in each:
Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.11 474.23 KiB ? 7297d21e-a04e-4bb1-91d9-8149b03fb60a -9223372036854775808 rack1 Datacenter: DC2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.113 518.36 KiB ? 2ff7d46c-f084-477e-aa53-0f4791c71dbc -9223372036854775798 rack1 Datacenter: DC3 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.111 461.56 KiB ? ac43e602-ef09-4d0d-a455-3311f444198c -9223372036854775788 rack1
-
Run a full repair:
nodetool repair --full
-
Using JConsole, check the following JMX Beans to make sure there are no active connections:
-
org.apache.cassandra.metrics/Client/connectedNativeClients
-
org.apache.cassandra.metrics/Client/connectedThriftClients
-
-
Verify that there are no pending write requests on each node that is being removed (The
Pending
column should read0
orN/A
):nodetool tpstats
Pool Name Active Pending (w/Backpressure) Delayed Completed... BackgroundIoStage 0 0 (N/A) N/A 640... CompactionExecutor 0 0 (N/A) N/A 1039... GossipStage 0 0 (N/A) N/A 4580... HintsDispatcher 0 0 (N/A) N/A 2...
-
Start
cqlsh
and remove DC3 from all keyspace configurations. Repeat for each keyspace that has a RF set for DC3:alter keyspace cycling WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1':1,'DC2':2};
-
Shutdown the OpsCenter Repair Service if in use.
-
Shutdown all nodes in the datacenter.
-
Stop the DataStax Agent on each node if in use.
-
Run nodetool assassinate on each node in the DC3 (datacenter that is being removed):
nodetool assassinate <remote_IP_address>
-
In a remaining datacenter verify that the DC3 has been removed:
nodetool status
Datacenter: DC1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.11 503.54 KiB ? 7297d21e-a04e-4bb1-91d9-8149b03fb60a -9223372036854775808 rack1 Datacenter: DC2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Owns Host ID Token Rack UN 10.200.175.113 522.47 KiB ? 2ff7d46c-f084-477e-aa53-0f4791c71dbc -9223372036854775798 rack1