Remove nodes from a cluster

When cloud costs are excessive, Mission Control allows the removal of a number of nodes in existing datacenters to better control those costs. Servicing customer queries continues uninterrupted in any datacenter because Mission Control controls and balances the decommission of nodes.

This task information focuses on removing nodes from a single datacenter.

To scale-down the number of datacenters, follow the remove a DSE datacenter task.

Workflow of user and operators

  1. User submits the modified MissionControlCluster to Control Plane Kubernetes cluster with reduced size parameter.

  2. Cluster-level operator detects dc-level change in cluster object, modifies dc-level resources.

  3. DC-level operator detects size change in dc-level resource, decommissions nodes one by one.

    When decommissioning nodes, Mission Control considers:

    • any target datacenter.

    • a rejection of any desired cluster size when it is incompatible with the number of defined racks.

    • targeting the rack with the highest number of active nodes.

    • choosing the first rack name according to an ascending sort order in the case of a tie between racks and their number of nodes.

    • decommissioning multiple nodes in a single rack occurs only after adjusting the remaining racks in the datacenter to reflect the desired node count.

      Mission Control enlists cass-operator to check that the remaining nodes have enough capacity to handle the increased storage requirements. If cass-operator determines that there is insufficient capacity, then it logs a message, and reports units in bytes. Otherwise, cass-operator automatically runs nodetool decommission on the node to be removed. As a final step, the pod is terminated.

      Limitations - You must decrease the datacenter size by a multiple of the number of racks in the target datacenter. For example, with 3 racks you may scale down by 3, 6, or 9 nodes, and so on. Invalid size parameters are ignored.

Prerequisites

  • A prepared environment on either bare-metal/VM or an existing Kubernetes cluster.

  • A Kubernetes Data Plane cluster.

  • An existing MissionControlCluster manifest specifying one datacenter with three DSE nodes distributed equally across three racks.

Remove nodes from a datacenter in a cluster

The goal is to modify the MissionControlCluster manifest (object) specification and submit that change with the kubectl command to to remove one or more nodes from a datacenter in a Kubernetes cluster.

  1. Here is a sample MissionControlCluster manifest named example.missioncontrolcluster.yaml that was used to create the datacenter. Notice that the datacenters size field is set at 6, specifying six (6) nodes equally distributed across three (3) racks.

    ...
          datacenters:
            - metadata:
                name: dc1
              k8sContext: east
              size: 6
              racks:
                - name: rack1
                  nodeAffinityLabels:
                    topology.kubernetes.io/zone: us-east1-c
                - name: rack2
                  nodeAffinityLabels:
                    topology.kubernetes.io/zone: us-east1-b
                - name: rack3
                  nodeAffinityLabels:
                    topology.kubernetes.io/zone: us-east1-d
    ...
  2. Modify this MissionControlCluster manifest to remove one (1) node. Edit the datacenters size field, changing 6 to 5:

    ...
          datacenters:
            - metadata:
                name: dc1
              k8sContext: east
              size: 5
              racks:
    ...
  3. Submit this change in the Control Plane cluster to initiate the terminatation operation:

    kubectl apply -f example.missioncontrolcluster.yaml
    Sample results
    demo-dc1-rack3-sts-0
    
    demo-dc1-rack3-sts-1

    The node is terminated from the last rack that was specified. The StatefulSet per rack is also a factor and must be 1 before a node is terminated. For example, the operation’s algorithm chooses to terminate demo-dc1-rack3-sts-1 given the nodes in the results.

    At any given time the number of started DSE nodes in a rack cannot be more or less than the number of started nodes in all other racks by more than one.

  4. Monitor the status of the CassandraDatacenter with this command:

    kubectl get cassandradatacenter dc1 -o yaml
    Sample results
    status:
      cassandraOperatorProgress: Updating
      conditions:
      - lastTransitionTime: "2022-10-20T02:43:33Z"
        message: ""
        reason: ""
        status: "True"
        type: Healthy
      - lastTransitionTime: "2022-10-20T02:49:05Z"
        message: ""
        reason: ""
        status: "False"
        type: Stopped
      - lastTransitionTime: "2022-10-20T02:49:05Z"
        message: ""
        reason: ""
        status: "False"
        type: ReplacingNodes
      - lastTransitionTime: "2022-10-20T03:19:22Z"
        message: ""
        reason: ""
        status: "False"
        type: Updating
      - lastTransitionTime: "2022-10-20T02:49:05Z"
        message: ""
        reason: ""
        status: "False"
        type: RollingRestart
      - lastTransitionTime: "2022-10-20T02:49:05Z"
        message: ""
        reason: ""
        status: "False"
        type: Resuming
      - lastTransitionTime: "2022-10-20T14:57:59Z"
        message: ""
        reason: ""
        status: "True"
        type: ScalingDown
      - lastTransitionTime: "2022-10-20T02:49:05Z"
        message: ""
        reason: ""
        status: "True"
        type: Valid
      - lastTransitionTime: "2022-10-20T02:49:05Z"
        message: ""
        reason: ""
        status: "True"
        type: Initialized
      - lastTransitionTime: "2022-10-20T02:49:05Z"
        message: ""
        reason: ""
        status: "True"
        type: Ready
      - lastTransitionTime: "2022-10-20T03:19:22Z"
        message: ""
        reason: ""
        status: "False"
        type: ScalingUp
      lastServerNodeStarted: "2022-10-20T03:16:15Z"
      nodeStatuses:
        demo-dc1-rack1-sts-0:
          hostID: fedbd183-6075-4251-a37d-76102845919a
        demo-dc1-rack1-sts-1:
          hostID: b77157d1-829b-4965-9f64-4590104e7b9f
        demo-dc1-rack2-sts-0:
          hostID: 5ed4767f-b827-469e-b78e-2ceaab71943e
        demo-dc1-rack2-sts-1:
          hostID: 73813553-e081-42da-8f45-6fb67faf309b
        demo-dc1-rack3-sts-0:
          hostID: 3b456f44-556d-4f6f-9f25-5aae048f8aa8
        demo-dc1-rack3-sts-1:
          hostID: 5f316bf0-3e39-40c0-8a66-4998337336bf
      observedGeneration: 3
      quietPeriod: "2022-10-20T14:36:38Z"
      superUserUpserted: "2022-10-20T14:36:33Z"
      usersUpserted: "2022-10-20T14:36:33Z"

    The ScalingDown condition has a status of True indicating that the scaling down operation is in progress. Mission Control updates it to False when the operation is complete.

  5. Monitor the node logs to verify that the node is terminated with this command:

    kubectl logs -f demo-dc1-rack3-sts-1 -c server-system-logger
    Sample results
    ...
    INFO  [pool-20-thread-1] 2022-10-20 14:58:30,229  StreamResultFuture.java:108 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Initiated streaming session for Unbootstrap
    INFO  [pool-20-thread-1] 2022-10-20 14:58:30,230  StorageService.java:2143 - LEAVING: streaming hints to other nodes
    INFO  [Stream-Connection-Establisher:2] 2022-10-20 14:58:30,233  StreamSession.java:385 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Starting streaming with /10.100.5.12
    INFO  [pool-20-thread-1] 2022-10-20 14:58:30,234  HintsService.java:230 - Resumed hints dispatch
    INFO  [HintsDispatcher:3] 2022-10-20 14:58:30,238  HintsDispatchExecutor.java:163 - Transferring all hints to /10.100.5.12: 3b456f44-556d-4f6f-9f25-5aae048f8aa8
    INFO  [Stream-Connection-Establisher:2] 2022-10-20 14:58:30,241  StreamCoordinator.java:291 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Beginning stream session with /10.100.5.12
    INFO  [Stream-Deserializer-10.100.5.12/10.100.5.12:7000-6c5d4795] 2022-10-20 14:58:30,249  StreamResultFuture.java:198 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Prepare completed. Receiving 0 files(0.000KiB), sending 2 files(121.051KiB)
    INFO  [Stream-Serializer-/10.100.5.12:1] 2022-10-20 14:58:30,291  StreamLimiter.java:64 - Configured stream limiter with local DC throughput at 25MB/s and buffer size of 25MB, inter DC throughput at 25MB/s and buffer size of 25MB
    INFO  [Stream-Deserializer-10.100.5.12/10.100.5.12:7000-6c5d4795] 2022-10-20 14:58:31,605  StreamResultFuture.java:212 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b session: 0/10.100.5.12] Session with /10.100.5.12 is complete
    INFO  [Stream-Deserializer-10.100.5.12/10.100.5.12:7000-6c5d4795] 2022-10-20 14:58:31,615  StreamResultFuture.java:249 - [Stream plan: a8c715e0-5087-11ed-910e-b52a292b413b] All sessions completed org.apache.cassandra.streaming.StreamState@51ae56f3
    INFO  [pool-20-thread-1] 2022-10-20 14:58:31,622  StorageService.java:3474 - Removing tokens [7174889864644096175, -5340669455939452840, 6589226831831228631, -3361306996063616279, -2877281716039476745, -6917033737490471623, -4440678482902958568, -5921294927776755314, 6562241473441405040, -8205472916598434127, -1406860119728070875, 4559174963203125054, -3911541421963556594, 7899698028842487330, -3141618503503578628, -2561210462966576697] for /10.100.3.7
    INFO  [pool-20-thread-1] 2022-10-20 14:58:31,631  Gossiper.java:1301 - InetAddress /10.100.3.7 is now DOWN
    INFO  [pool-20-thread-1] 2022-10-20 14:58:31,638  StorageService.java:4968 - Announcing that I have left the ring for 30000ms
    Alternate results
    ...
    status:
       conditions:
      - lastTransitionTime: "2021-03-30T22:01:48Z"
    message: "Not enough free space available to decommission. my-k8ssandra-dc1-default-sts-3 has 12345 free space, but 67891 is needed."
        reason: "NotEnoughSpaceToScaleDown"
        status: "False"
        type: Valid
    ...

The node (pod) is terminated after the termination operation completes. Mission Control removes the demo-dc1-rack3-sts-1 node from the nodeStatuses map.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com