Terminate a datacenter within an existing DSE cluster
Use Mission Control to terminate one or more datacenters. Do this from an existing cluster, one at a time. With multiple datacenters, select the order in which to terminate based on DataStax Enterprise (DSE) rules. Refer to the DSE upgrading planning guide detailing the upgrade priorities. For instance, terminate DSE Analytics datacenters first, taking into account whether the nodes use DSE Hadoop or Spark. Second is the termination of DSE Graph or transactional datacenters, followed by datacenters running DSE Search nodes.
Mission Control manages the system keyspaces replication during the termination process without altering user-defined keyspaces. For instance, if a datacenter selected for termination still has user keyspaces replicated to it, then Mission Control blocks the termination until the keyspaces are manually altered. This is a safety measure to prevent unintended removal of data. |
Prerequisites
-
A prepared environment on either bare-metal/VM or an existing Kubernetes cluster.
Example
Within an existing multi-datacenter Kubernetes cluster are two datacenters (DC1 in west region and DC2 in east region), each with 3 nodes. The decision is made to reduce costs and run from a single west region datacenter.
Workflow of user and operators
-
User updates the replication strategy on user-defined keyspaces to remove references on the east region datacenter being terminated. User must wait to update east region keyspaces in use until those keyspaces are dormant.
Mission Control operators automatically, and by default, update all the system keyspaces to terminate the east region datacenter and then terminate its nodes (pods).
Mission Control issues an error if there are keyspaces actively using the datacenter in the east region that is targeted for termination.
-
User submits the updated
MissionControlCluster
to theControl Plane
Kubernetes cluster. -
Cluster-level operator picks up the modification and automatically update keyspace replication settings on system keyspaces.
-
Cluster-level operator deletes datacenter-level resources in the Kubernetes cluster where the nodes are to be terminated.
-
DC-level operator picks up datacenter-level resource changes and deletes native Kubernetes objects representing the DSE nodes.
If a user-defined keyspace is still replicating to the DC that is targeted for termination then the operation FAILS. By design all user-defined keyspaces MUST NOT reference the DC to be terminated.
Terminate an existing cluster’s datacenter
-
Modify the existing
MissionControlCluster
YAML (demo-dse.yaml
) in theControl Plane
Kubernetes cluster, updating thespec.datacenters
list so that it no longer references the datacenterdc1
targeted for termination. In this example, the following lines are deleted:spec: k8ssandra: cassandra: datacenters: - metadata: name: dc1 k8sContext: east size: 3 racks: - name: rack1 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-c - name: rack2 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-b - name: rack3 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-d
-
Submit this modification to the Kubernetes
Control Plane
cluster with the following command:kubectl apply -f demo-dse.cassandratask.yaml
The following keyspaces are updated:
-
system_traces
-
system_distributed
-
system_auth
-
dse_leases
-
dse_perf
-
dse_security
-
-
Monitor the termination operation progress in the
Control Plane
cluster with this command:kubectl get k8ssandracluster demo
In the event that any user-defined keyspaces are still replicating to datacenter targeted for termination, the
kubectl
command returns an error such as:NAME ERROR demo cannot decommission DC dc1: keyspace ks1 still has replicas on it
To rectify this error, the replication of all user-defined keyspaces must be manually updated to remove references to the datacenter being terminated.
ALTER KEYSPACE ks1 WITH replication = {'class': 'NetworkTopologyStrategy', 'west': 3};
-
Monitor the termination progress by checking the status of the datacenter to be terminated in the east cluster. This example uses the following command:
kubectl get cassandradatacenter dc1 -o yaml
Sample results
status: cassandraOperatorProgress: Updating conditions: - lastTransitionTime: "2022-10-24T02:43:20Z" message: "" reason: "" status: "True" type: Healthy - lastTransitionTime: "2022-10-24T02:43:20Z" message: "" reason: "" status: "False" type: Stopped ...
The sample output indicates that one DSE node is online and one is not at this point in the monitoring. The CassandraDatacenter
dc1
is terminated when DC-level operators set all of the Decommissionconditions:status
to "False". ThenodeStatuses
map is also updated.The DC-level operators must terminate each node (pod) in the datacenter before the datacenter itself is terminated.
-
Monitor the DSE logs with this command:
kubectl logs demo-dc1-rack3-sts-0 -c server-system-logger
where
demo-dc1-rack3-sts-0
is the StatefulSet designation of the ordinal index of the node in a rack.Sample results
INFO [pool-17-thread-1] 2022-10-27 17:13:09,717 StorageService.java:2143 - LEAVING: sleeping 30000 ms for batch processing and pending range setup ... INFO [pool-17-thread-1] 2022-10-27 17:13:39,770 Gossiper.java:1301 - InetAddress /10.100.5.15 is now DOWN INFO [pool-17-thread-1] 2022-10-27 17:13:39,788 StorageService.java:4968 - Announcing that I have left the ring for 30000ms
Upon successful completion of the east datacenter termination operation, users now run only in the west region datacenter.