Remove nodes from a cluster

When cloud costs are excessive, Mission Control allows the removal of a number of nodes in existing datacenters to better control those costs. Servicing customer queries continues uninterrupted in any datacenter because Mission Control controls and balances the decommission of nodes.

This task information focuses on removing nodes from a single datacenter.

To scale-down the number of datacenters, follow the remove a DSE datacenter task.

Workflow of user and operators

User submits the modified MissionControlCluster to control plane Kubernetes cluster with reduced size parameter.
Cluster-level operator detects dc-level change in cluster object, modifies dc-level resources.

DC-level operator detects size change in dc-level resource, decommissions nodes one by one.

When decommissioning nodes, Mission Control considers:

any target datacenter.
a rejection of any desired cluster size when it is incompatible with the number of defined racks.
targeting the rack with the highest number of active nodes.
choosing the first rack name according to an ascending sort order in the case of a tie between racks and their number of nodes.

decommissioning multiple nodes in a single rack occurs only after adjusting the remaining racks in the datacenter to reflect the desired node count.

Mission Control enlists cass-operator to check that the remaining nodes have enough capacity to handle the increased storage requirements. If cass-operator determines that there is insufficient capacity, then it logs a message, and reports units in bytes. Otherwise, cass-operator automatically runs nodetool decommission on the node to be removed. As a final step, the pod is terminated.

Limitations - You must decrease the datacenter size by a multiple of the number of racks in the target datacenter. For example, with 3 racks you may scale down by 3, 6, or 9 nodes, and so on. Invalid size parameters are ignored.

Prerequisites

A prepared environment on either bare-metal/VM or an existing Kubernetes cluster.
A Kubernetes data plane cluster.
An existing MissionControlCluster manifest specifying one datacenter with three database nodes distributed equally across three racks.

Remove nodes from a datacenter in a cluster

The goal is to modify the MissionControlCluster manifest (object) specification and submit that change with the kubectl command to remove one or more nodes from a datacenter in a Kubernetes cluster.

Here is a sample MissionControlCluster manifest named example.missioncontrolcluster.yaml that was used to create the datacenter. Notice that the datacenters size field is set at 6, specifying six (6) nodes equally distributed across three (3) racks.

...
      datacenters:
        - metadata:
            name: dc1
          k8sContext: east
          size: 6
          racks:
            - name: rack1
              nodeAffinityLabels:
                topology.kubernetes.io/zone: us-east1-c
            - name: rack2
              nodeAffinityLabels:
                topology.kubernetes.io/zone: us-east1-b
            - name: rack3
              nodeAffinityLabels:
                topology.kubernetes.io/zone: us-east1-d
...

Modify this MissionControlCluster manifest to remove one (1) node. Edit the datacenters size field, changing 6 to 5:

...
      datacenters:
        - metadata:
            name: dc1
          k8sContext: east
          size: 5
          racks:
...

Submit this change in the control plane cluster to initiate the termination operation:
```
kubectl apply -f example.missioncontrolcluster.yaml
```
Result
demo-dc1-rack3-sts-0 demo-dc1-rack3-sts-1
The node is terminated from the last rack that was specified. The StatefulSet per rack is also a factor and must be 1 before a node is terminated. For example, the operation’s algorithm chooses to terminate demo-dc1-rack3-sts-1 given the nodes in the results.

At any time, the number of active database nodes in a rack must not differ from the number of active nodes in all other racks by more than one.

Monitor the status of the CassandraDatacenter with this command:

kubectl get cassandradatacenter dc1 -o yaml

Result

status:
  cassandraOperatorProgress: Updating
  conditions:
  - lastTransitionTime: "2022-10-20T02:43:33Z"
    message: ""
    reason: ""
    status: "True"
    type: Healthy
  - lastTransitionTime: "2022-10-20T02:49:05Z"
    message: ""
    reason: ""
    status: "False"
    type: Stopped
  - lastTransitionTime: "2022-10-20T02:49:05Z"
    message: ""
    reason: ""
    status: "False"
    type: ReplacingNodes
  - lastTransitionTime: "2022-10-20T03:19:22Z"
    message: ""
    reason: ""
    status: "False"
    type: Updating
  - lastTransitionTime: "2022-10-20T02:49:05Z"
    message: ""
    reason: ""
    status: "False"
    type: RollingRestart
  - lastTransitionTime: "2022-10-20T02:49:05Z"
    message: ""
    reason: ""
    status: "False"
    type: Resuming
  - lastTransitionTime: "2022-10-20T14:57:59Z"
    message: ""
    reason: ""
    status: "True"
    type: ScalingDown
  - lastTransitionTime: "2022-10-20T02:49:05Z"
    message: ""
    reason: ""
    status: "True"
    type: Valid
  - lastTransitionTime: "2022-10-20T02:49:05Z"
    message: ""
    reason: ""
    status: "True"
    type: Initialized
  - lastTransitionTime: "2022-10-20T02:49:05Z"
    message: ""
    reason: ""
    status: "True"
    type: Ready
  - lastTransitionTime: "2022-10-20T03:19:22Z"
    message: ""
    reason: ""
    status: "False"
    type: ScalingUp
  lastServerNodeStarted: "2022-10-20T03:16:15Z"
  nodeStatuses:
    demo-dc1-rack1-sts-0:
      hostID: fedbd183-6075-4251-a37d-76102845919a
    demo-dc1-rack1-sts-1:
      hostID: b77157d1-829b-4965-9f64-4590104e7b9f
    demo-dc1-rack2-sts-0:
      hostID: 5ed4767f-b827-469e-b78e-2ceaab71943e
    demo-dc1-rack2-sts-1:
      hostID: 73813553-e081-42da-8f45-6fb67faf309b
    demo-dc1-rack3-sts-0:
      hostID: 3b456f44-556d-4f6f-9f25-5aae048f8aa8
    demo-dc1-rack3-sts-1:
      hostID: 5f316bf0-3e39-40c0-8a66-4998337336bf
  observedGeneration: 3
  quietPeriod: "2022-10-20T14:36:38Z"
  superUserUpserted: "2022-10-20T14:36:33Z"
  usersUpserted: "2022-10-20T14:36:33Z"

The ScalingDown condition has a status of True indicating that the scaling down operation is in progress. Mission Control updates it to False when the operation is complete.

Monitor the node logs to verify that the node is terminated with this command:

kubectl logs -f demo-dc1-rack3-sts-1 -c server-system-logger

Result

...
INFO  [pool-20-thread-1] 2022-10-20 14:58:30,229  StreamResultFuture.java:108 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Initiated streaming session for Unbootstrap
INFO  [pool-20-thread-1] 2022-10-20 14:58:30,230  StorageService.java:2143 - LEAVING: streaming hints to other nodes
INFO  [Stream-Connection-Establisher:2] 2022-10-20 14:58:30,233  StreamSession.java:385 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Starting streaming with /10.100.5.12
INFO  [pool-20-thread-1] 2022-10-20 14:58:30,234  HintsService.java:230 - Resumed hints dispatch
INFO  [HintsDispatcher:3] 2022-10-20 14:58:30,238  HintsDispatchExecutor.java:163 - Transferring all hints to /10.100.5.12: 3b456f44-556d-4f6f-9f25-5aae048f8aa8
INFO  [Stream-Connection-Establisher:2] 2022-10-20 14:58:30,241  StreamCoordinator.java:291 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Beginning stream session with /10.100.5.12
INFO  [Stream-Deserializer-10.100.5.12/10.100.5.12:7000-6c5d4795] 2022-10-20 14:58:30,249  StreamResultFuture.java:198 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Prepare completed. Receiving 0 files(0.000KiB), sending 2 files(121.051KiB)
INFO  [Stream-Serializer-/10.100.5.12:1] 2022-10-20 14:58:30,291  StreamLimiter.java:64 - Configured stream limiter with local DC throughput at 25MB/s and buffer size of 25MB, inter DC throughput at 25MB/s and buffer size of 25MB
INFO  [Stream-Deserializer-10.100.5.12/10.100.5.12:7000-6c5d4795] 2022-10-20 14:58:31,605  StreamResultFuture.java:212 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Session with /10.100.5.12 is complete
INFO  [Stream-Deserializer-10.100.5.12/10.100.5.12:7000-6c5d4795] 2022-10-20 14:58:31,615  StreamResultFuture.java:249 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b] All sessions completed org.apache.cassandra.streaming.StreamState@51ae56f3
INFO  [pool-20-thread-1] 2022-10-20 14:58:31,622  StorageService.java:3474 - Removing tokens [7174889864644096175, -5340669455939452840, 6589226831831228631, -3361306996063616279, -2877281716039476745, -6917033737490471623, -4440678482902958568, -5921294927776755314, 6562241473441405040, -8205472916598434127, -1406860119728070875, 4559174963203125054, -3911541421963556594, 7899698028842487330, -3141618503503578628, -2561210462966576697] for /10.100.3.7
INFO  [pool-20-thread-1] 2022-10-20 14:58:31,631  Gossiper.java:1301 - InetAddress /10.100.3.7 is now DOWN
INFO  [pool-20-thread-1] 2022-10-20 14:58:31,638  StorageService.java:4968 - Announcing that I have left the ring for 30000ms

Alternate Result

...
status:
   conditions:
  - lastTransitionTime: "2021-03-30T22:01:48Z"
message: "Not enough free space available to decommission. my-k8ssandra-dc1-default-sts-3 has 12345 free space, but 67891 is needed."
    reason: "NotEnoughSpaceToScaleDown"
    status: "False"
    type: Valid
...

The node (pod) is terminated after the termination operation completes. Mission Control removes the demo-dc1-rack3-sts-1 node from the nodeStatuses map.

Remove nodes from a cluster

Workflow of user and operators

Prerequisites

Remove nodes from a datacenter in a cluster

Was this helpful?

Give Feedback