Remove nodes from a cluster
When cloud costs are excessive, Mission Control allows the removal of a number of nodes in existing datacenters to better control those costs. Servicing customer queries continues uninterrupted in any datacenter because Mission Control controls and balances the decommission of nodes.
This task information focuses on removing nodes from a single datacenter. To scale-down the number of datacenters, follow the remove a DSE datacenter task. |
Workflow of user and operators
-
User submits the modified
MissionControlCluster
toControl Plane
Kubernetes cluster with reducedsize
parameter. -
Cluster-level operator detects dc-level change in cluster object, modifies dc-level resources.
-
DC-level operator detects size change in dc-level resource, decommissions nodes one by one.
When decommissioning nodes, Mission Control considers:
-
any target datacenter.
-
a rejection of any desired cluster size when it is incompatible with the number of defined racks.
-
targeting the rack with the highest number of active nodes.
-
choosing the first rack name according to an ascending sort order in the case of a tie between racks and their number of nodes.
-
decommissioning multiple nodes in a single rack occurs only after adjusting the remaining racks in the datacenter to reflect the desired node count.
Mission Control enlists cass-operator to check that the remaining nodes have enough capacity to handle the increased storage requirements. If cass-operator determines that there is insufficient capacity, then it logs a message, and reports units in bytes. Otherwise, cass-operator automatically runs
nodetool decommission
on the node to be removed. As a final step, the pod is terminated.Limitations - You must decrease the datacenter size by a multiple of the number of racks in the target datacenter. For example, with 3 racks you may scale down by 3, 6, or 9 nodes, and so on. Invalid size parameters are ignored.
-
Prerequisites
-
A prepared environment on either bare-metal/VM or an existing Kubernetes cluster.
-
A Kubernetes
Data Plane
cluster. -
An existing
MissionControlCluster
manifest specifying one datacenter with three DSE nodes distributed equally across three racks.
Remove nodes from a datacenter in a cluster
The goal is to modify the MissionControlCluster
manifest (object) specification and submit that change with the kubectl
command to
to remove one or more nodes from a datacenter in a Kubernetes cluster.
-
Here is a sample
MissionControlCluster
manifest namedexample.missioncontrolcluster.yaml
that was used to create the datacenter. Notice that thedatacenters
size
field is set at6
, specifying six (6) nodes equally distributed across three (3) racks.... datacenters: - metadata: name: dc1 k8sContext: east size: 6 racks: - name: rack1 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-c - name: rack2 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-b - name: rack3 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-d ...
-
Modify this
MissionControlCluster
manifest to remove one (1) node. Edit thedatacenters
size
field, changing6
to5
:... datacenters: - metadata: name: dc1 k8sContext: east size: 5 racks: ...
-
Submit this change in the
Control Plane
cluster to initiate the terminatation operation:kubectl apply -f example.missioncontrolcluster.yaml
Sample results
demo-dc1-rack3-sts-0 demo-dc1-rack3-sts-1
The node is terminated from the last rack that was specified. The
StatefulSet
per rack is also a factor and must be1
before a node is terminated. For example, the operation’s algorithm chooses to terminatedemo-dc1-rack3-sts-1
given the nodes in the results.At any given time the number of started DSE nodes in a rack cannot be more or less than the number of started nodes in all other racks by more than one.
-
Monitor the status of the CassandraDatacenter with this command:
kubectl get cassandradatacenter dc1 -o yaml
Sample results
status: cassandraOperatorProgress: Updating conditions: - lastTransitionTime: "2022-10-20T02:43:33Z" message: "" reason: "" status: "True" type: Healthy - lastTransitionTime: "2022-10-20T02:49:05Z" message: "" reason: "" status: "False" type: Stopped - lastTransitionTime: "2022-10-20T02:49:05Z" message: "" reason: "" status: "False" type: ReplacingNodes - lastTransitionTime: "2022-10-20T03:19:22Z" message: "" reason: "" status: "False" type: Updating - lastTransitionTime: "2022-10-20T02:49:05Z" message: "" reason: "" status: "False" type: RollingRestart - lastTransitionTime: "2022-10-20T02:49:05Z" message: "" reason: "" status: "False" type: Resuming - lastTransitionTime: "2022-10-20T14:57:59Z" message: "" reason: "" status: "True" type: ScalingDown - lastTransitionTime: "2022-10-20T02:49:05Z" message: "" reason: "" status: "True" type: Valid - lastTransitionTime: "2022-10-20T02:49:05Z" message: "" reason: "" status: "True" type: Initialized - lastTransitionTime: "2022-10-20T02:49:05Z" message: "" reason: "" status: "True" type: Ready - lastTransitionTime: "2022-10-20T03:19:22Z" message: "" reason: "" status: "False" type: ScalingUp lastServerNodeStarted: "2022-10-20T03:16:15Z" nodeStatuses: demo-dc1-rack1-sts-0: hostID: fedbd183-6075-4251-a37d-76102845919a demo-dc1-rack1-sts-1: hostID: b77157d1-829b-4965-9f64-4590104e7b9f demo-dc1-rack2-sts-0: hostID: 5ed4767f-b827-469e-b78e-2ceaab71943e demo-dc1-rack2-sts-1: hostID: 73813553-e081-42da-8f45-6fb67faf309b demo-dc1-rack3-sts-0: hostID: 3b456f44-556d-4f6f-9f25-5aae048f8aa8 demo-dc1-rack3-sts-1: hostID: 5f316bf0-3e39-40c0-8a66-4998337336bf observedGeneration: 3 quietPeriod: "2022-10-20T14:36:38Z" superUserUpserted: "2022-10-20T14:36:33Z" usersUpserted: "2022-10-20T14:36:33Z"
The
ScalingDown
condition has a status ofTrue
indicating that the scaling down operation is in progress. Mission Control updates it toFalse
when the operation is complete. -
Monitor the node logs to verify that the node is terminated with this command:
kubectl logs -f demo-dc1-rack3-sts-1 -c server-system-logger
Sample results
... INFO [pool-20-thread-1] 2022-10-20 14:58:30,229 StreamResultFuture.java:108 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Initiated streaming session for Unbootstrap INFO [pool-20-thread-1] 2022-10-20 14:58:30,230 StorageService.java:2143 - LEAVING: streaming hints to other nodes INFO [Stream-Connection-Establisher:2] 2022-10-20 14:58:30,233 StreamSession.java:385 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Starting streaming with /10.100.5.12 INFO [pool-20-thread-1] 2022-10-20 14:58:30,234 HintsService.java:230 - Resumed hints dispatch INFO [HintsDispatcher:3] 2022-10-20 14:58:30,238 HintsDispatchExecutor.java:163 - Transferring all hints to /10.100.5.12: 3b456f44-556d-4f6f-9f25-5aae048f8aa8 INFO [Stream-Connection-Establisher:2] 2022-10-20 14:58:30,241 StreamCoordinator.java:291 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Beginning stream session with /10.100.5.12 INFO [Stream-Deserializer-10.100.5.12/10.100.5.12:7000-6c5d4795] 2022-10-20 14:58:30,249 StreamResultFuture.java:198 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Prepare completed. Receiving 0 files(0.000KiB), sending 2 files(121.051KiB) INFO [Stream-Serializer-/10.100.5.12:1] 2022-10-20 14:58:30,291 StreamLimiter.java:64 - Configured stream limiter with local DC throughput at 25MB/s and buffer size of 25MB, inter DC throughput at 25MB/s and buffer size of 25MB INFO [Stream-Deserializer-10.100.5.12/10.100.5.12:7000-6c5d4795] 2022-10-20 14:58:31,605 StreamResultFuture.java:212 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Session with /10.100.5.12 is complete INFO [Stream-Deserializer-10.100.5.12/10.100.5.12:7000-6c5d4795] 2022-10-20 14:58:31,615 StreamResultFuture.java:249 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b] All sessions completed org.apache.cassandra.streaming.StreamState@51ae56f3 INFO [pool-20-thread-1] 2022-10-20 14:58:31,622 StorageService.java:3474 - Removing tokens [7174889864644096175, -5340669455939452840, 6589226831831228631, -3361306996063616279, -2877281716039476745, -6917033737490471623, -4440678482902958568, -5921294927776755314, 6562241473441405040, -8205472916598434127, -1406860119728070875, 4559174963203125054, -3911541421963556594, 7899698028842487330, -3141618503503578628, -2561210462966576697] for /10.100.3.7 INFO [pool-20-thread-1] 2022-10-20 14:58:31,631 Gossiper.java:1301 - InetAddress /10.100.3.7 is now DOWN INFO [pool-20-thread-1] 2022-10-20 14:58:31,638 StorageService.java:4968 - Announcing that I have left the ring for 30000ms
Alternate results
... status: conditions: - lastTransitionTime: "2021-03-30T22:01:48Z" message: "Not enough free space available to decommission. my-k8ssandra-dc1-default-sts-3 has 12345 free space, but 67891 is needed." reason: "NotEnoughSpaceToScaleDown" status: "False" type: Valid ...
The node (pod) is terminated after the termination operation completes.
Mission Control removes the demo-dc1-rack3-sts-1
node from the nodeStatuses map.