Remove nodes from a cluster
When cloud costs are excessive, Mission Control allows the removal of a number of nodes in existing datacenters to better control those costs. Servicing customer queries continues uninterrupted in any datacenter because Mission Control controls and balances the decommission of nodes.
|
This task information focuses on removing nodes from a single datacenter. To scale-down the number of datacenters, follow the Terminate a datacenter task. |
Workflow of user and operators
-
User submits the modified
MissionControlClusterto control plane Kubernetes cluster with reducedsizeparameter. -
Cluster-level operator detects dc-level change in cluster object, modifies dc-level resources.
-
DC-level operator detects size change in dc-level resource, decommissions nodes one by one.
When decommissioning nodes, Mission Control considers:
-
any target datacenter.
-
a rejection of any desired cluster size when it is incompatible with the number of defined racks.
-
targeting the rack with the highest number of active nodes.
-
choosing the first rack name according to an ascending sort order in the case of a tie between racks and their number of nodes.
-
decommissioning multiple nodes in a single rack occurs only after adjusting the remaining racks in the datacenter to reflect the desired node count.
Mission Control enlists
cass-operatorto check that the remaining nodes have enough capacity to handle the increased storage requirements. Ifcass-operatordetermines that there is insufficient capacity, then it logs a message, and reports units in bytes. Otherwise,cass-operatorautomatically runsnodetool decommissionon the node to be removed. As a final step, the pod is terminated.Limitations - You must decrease the datacenter size by a multiple of the number of racks in the target datacenter. For example, with three racks you may scale down by three, six, or nine nodes, and so on. Invalid size parameters are ignored.
-
Prerequisites
-
A prepared environment on either bare-metal/VM or an existing Kubernetes cluster.
-
A Kubernetes data plane cluster.
-
An existing
MissionControlClustermanifest specifying one datacenter with three database nodes distributed equally across three racks.
Remove nodes from a datacenter in a cluster
The goal is to modify the MissionControlCluster manifest (object) specification and submit that change with the kubectl command to remove one or more nodes from a datacenter in a Kubernetes cluster.
-
Here is a sample
MissionControlClustermanifest namedexample.missioncontrolcluster.yamlthat was used to create the datacenter. Notice that thedatacenterssizefield is set at6, specifying six nodes equally distributed across three racks.... datacenters: - metadata: name: dc1 k8sContext: east size: 6 racks: - name: rack1 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-c - name: rack2 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-b - name: rack3 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-d ... -
Modify this
MissionControlClustermanifest to remove one node. Edit thedatacenterssizefield, changing6to5:... datacenters: - metadata: name: dc1 k8sContext: east size: 5 racks: ... -
Submit this change in the control plane cluster to initiate the termination operation:
kubectl apply -f example.missioncontrolcluster.yamlResult
demo-dc1-rack3-sts-0 demo-dc1-rack3-sts-1The node is terminated from the last rack that was specified. The
StatefulSetper rack is also a factor and must be1before a node is terminated. For example, the operation’s algorithm chooses to terminatedemo-dc1-rack3-sts-1given the nodes in the results.At any time, the number of active database nodes in a rack must not differ from the number of active nodes in all other racks by more than one.
-
Monitor the status of the CassandraDatacenter with this command:
kubectl get cassandradatacenter dc1 -o yamlResult
status: cassandraOperatorProgress: Updating conditions: - lastTransitionTime: "2025-08-28T02:43:33Z" message: "" reason: "" status: "True" type: Healthy - lastTransitionTime: "2025-08-28T02:49:05Z" message: "" reason: "" status: "False" type: Stopped - lastTransitionTime: "2025-08-28T02:49:05Z" message: "" reason: "" status: "False" type: ReplacingNodes - lastTransitionTime: "2025-08-28T03:19:22Z" message: "" reason: "" status: "False" type: Updating - lastTransitionTime: "2025-08-28T02:49:05Z" message: "" reason: "" status: "False" type: RollingRestart - lastTransitionTime: "2025-08-28T02:49:05Z" message: "" reason: "" status: "False" type: Resuming - lastTransitionTime: "2025-08-28T14:57:59Z" message: "" reason: "" status: "True" type: ScalingDown - lastTransitionTime: "2025-08-28T02:49:05Z" message: "" reason: "" status: "True" type: Valid - lastTransitionTime: "2025-08-28T02:49:05Z" message: "" reason: "" status: "True" type: Initialized - lastTransitionTime: "2025-08-28T02:49:05Z" message: "" reason: "" status: "True" type: Ready - lastTransitionTime: "2025-08-28T03:19:22Z" message: "" reason: "" status: "False" type: ScalingUp lastServerNodeStarted: "2025-08-28T03:16:15Z" nodeStatuses: demo-dc1-rack1-sts-0: hostID: fedbd183-6075-4251-a37d-76102845919a demo-dc1-rack1-sts-1: hostID: b77157d1-829b-4965-9f64-4590104e7b9f demo-dc1-rack2-sts-0: hostID: 5ed4767f-b827-469e-b78e-2ceaab71943e demo-dc1-rack2-sts-1: hostID: 73813553-e081-42da-8f45-6fb67faf309b demo-dc1-rack3-sts-0: hostID: 3b456f44-556d-4f6f-9f25-5aae048f8aa8 demo-dc1-rack3-sts-1: hostID: 5f316bf0-3e39-40c0-8a66-4998337336bf observedGeneration: 3 quietPeriod: "2025-08-28T14:36:38Z" superUserUpserted: "2025-08-28T14:36:33Z" usersUpserted: "2025-08-28T14:36:33Z"The
ScalingDowncondition has a status ofTrueindicating that the scaling down operation is in progress. Mission Control updates it toFalsewhen the operation is complete. -
Monitor the node logs to verify that the node is terminated with this command:
kubectl logs -f demo-dc1-rack3-sts-1 -c server-system-loggerResult
... INFO [pool-20-thread-1] 2025-08-28 14:58:30,229 StreamResultFuture.java:108 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Initiated streaming session for Unbootstrap INFO [pool-20-thread-1] 2025-08-28 14:58:30,230 StorageService.java:2143 - LEAVING: streaming hints to other nodes INFO [Stream-Connection-Establisher:2] 2025-08-28 14:58:30,233 StreamSession.java:385 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Starting streaming with /10.100.5.12 INFO [pool-20-thread-1] 2025-08-28 14:58:30,234 HintsService.java:230 - Resumed hints dispatch INFO [HintsDispatcher:3] 2025-08-28 14:58:30,238 HintsDispatchExecutor.java:163 - Transferring all hints to /10.100.5.12: 3b456f44-556d-4f6f-9f25-5aae048f8aa8 INFO [Stream-Connection-Establisher:2] 2025-08-28 14:58:30,241 StreamCoordinator.java:291 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Beginning stream session with /10.100.5.12 INFO [Stream-Deserializer-10.100.5.12/10.100.5.12:7000-6c5d4795] 2025-08-28 14:58:30,249 StreamResultFuture.java:198 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Prepare completed. Receiving 0 files(0.000KiB), sending 2 files(121.051KiB) INFO [Stream-Serializer-/10.100.5.12:1] 2025-08-28 14:58:30,291 StreamLimiter.java:64 - Configured stream limiter with local DC throughput at 25MB/s and buffer size of 25MB, inter DC throughput at 25MB/s and buffer size of 25MB INFO [Stream-Deserializer-10.100.5.12/10.100.5.12:7000-6c5d4795] 2025-08-28 14:58:31,605 StreamResultFuture.java:212 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Session with /10.100.5.12 is complete INFO [Stream-Deserializer-10.100.5.12/10.100.5.12:7000-6c5d4795] 2025-08-28 14:58:31,615 StreamResultFuture.java:249 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b] All sessions completed org.apache.cassandra.streaming.StreamState@51ae56f3 INFO [pool-20-thread-1] 2025-08-28 14:58:31,622 StorageService.java:3474 - Removing tokens [7174889864644096175, -5340669455939452840, 6589226831831228631, -3361306996063616279, -2877281716039476745, -6917033737490471623, -4440678482902958568, -5921294927776755314, 6562241473441405040, -8205472916598434127, -1406860119728070875, 4559174963203125054, -3911541421963556594, 7899698028842487330, -3141618503503578628, -2561210462966576697] for /10.100.3.7 INFO [pool-20-thread-1] 2025-08-28 14:58:31,631 Gossiper.java:1301 - InetAddress /10.100.3.7 is now DOWN INFO [pool-20-thread-1] 2025-08-28 14:58:31,638 StorageService.java:4968 - Announcing that I have left the ring for 30000msAlternate Result
... status: conditions: - lastTransitionTime: "2021-03-30T22:01:48Z" message: "Not enough free space available to decommission. my-k8ssandra-dc1-default-sts-3 has 12345 free space, but 67891 is needed." reason: "NotEnoughSpaceToScaleDown" status: "False" type: Valid ...
The node (pod) is terminated after the termination operation completes.
Mission Control removes the demo-dc1-rack3-sts-1 node from the nodeStatuses map.