Add nodes to a cluster

Adding nodes increases the capacity to service customer queries against the data.

This task information focuses on adding nodes to a single existing datacenter.

To scale-up the number of datacenters, follow the add a DSE datacenter task.

Prerequisites

Workflow of user and operators

  1. User submits modified datacenter size parameter in MissionControlCluster to the Control Plane Kubernetes cluster.

  2. Cluster-level operator detects dc-level change in the cluster object and modifies dc-level resources.

  3. DC-level operator detects size change in dc-level resource and provisions Kubernetes resources representing the new nodes.

  4. DC-level operator bootstraps DSE nodes on new pods.

    When commissioning nodes, Mission Control:

    • targets the rack with the lowest number of active nodes.

    • uses a bootstrap (self-starting process) that adds nodes without external input.

    • commissions multiple nodes in a single rack only after adjusting other racks in the datacenter to reflect the desired node count.

    • identifies the number of nodes being added.

      Limitations - You must increase the datacenter size by a multiple of the number of racks in the target datacenter. For example, with 3 racks you may scale up by 3, 6, or 9 nodes, and so on. Invalid size parameters are ignored.

Add nodes to a datacenter in a cluster

You start with an existing Kubernetes cluster with one datacenter that has three DSE nodes distributed equally across three racks. The goal is to modify the MissionControlCluster manifest (object) specification and submit that change with the kubectl command to to add one or more nodes to a datacenter in a Kubernetes cluster.

  1. Here is a sample MissionControlCluster manifest named demo.missioncontrolcluster.yaml that was used to initially create the datacenter (dc1):

    apiVersion: missioncontrol.datastax.com/v1beta1
    kind: MissionControlCluster
    metadata:
      name: demo
    spec:
      k8ssandra:
        cassandra:
          serverVersion: 6.8.26
          serverType: dse
          storageConfig:
            cassandraDataVolumeClaimSpec:
              storageClassName: premium-rwo
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                  storage: 5Gi
          datacenters:
            - metadata:
                name: dc1
              k8sContext: east
              size: 3
              racks:
                - name: rack1
                  nodeAffinityLabels:
                    topology.kubernetes.io/zone: us-east1-c
                - name: rack2
                  nodeAffinityLabels:
                    topology.kubernetes.io/zone: us-east1-b
                - name: rack3
                  nodeAffinityLabels:
                    topology.kubernetes.io/zone: us-east1-d
  2. Modify the datacenters.size specification from 3 - (1 node per rack) to 6 - (3 nodes per rack):

    apiVersion: missioncontrol.datastax.com/v1beta1
    kind: MissionControlCluster
    metadata:
      name: demo
    spec:
      ...
          datacenters:
            - metadata:
                name: dc1
              k8sContext: east
              size: 6
              racks:
    ...
  3. Submit this change in the Control Plane cluster:

    kubectl apply -f demo.cassandratask.yaml

    Three additional nodes (pods) deploy in parallel as the MissionControlCluster object increases in size from three to six nodes. Each node, however, starts serially as specified by the order of the rack definitions.

    At any given time the number of started nodes in a rack cannot be more or less than the number of started nodes in all other racks by more than one.

    By default, Mission Control configures the Cassandra pods so that Kubernetes is blocked from scheduling multiple DSE pods on the same worker node. An attempt to increase the cluster size beyond the number of available worker nodes may result in the non-deployment of additional pods.

  4. Monitor the status of the nodes being created:

    kubectl get pods -l "cassandra.datastax.com/cluster"=demo
    Sample results
    NAME                   READY   STATUS    RESTARTS   AGE
    demo-dc1-rack1-sts-0   2/2     Running   0          67m
    demo-dc1-rack1-sts-1   1/2     Running   0          110s
    demo-dc1-rack2-sts-0   2/2     Running   0          67m
    demo-dc1-rack2-sts-1   1/2     Running   0          110s
    demo-dc1-rack3-sts-0   2/2     Running   0          67m
    demo-dc1-rack3-sts-1   1/2     Running   0          110s

    The -l flag adds a label selector to filter the results. Every DSE pod has the cassandra.datastax.com/cluster label. There are six pods but only the initial three are fully ready. This is expected as the results were captured in mid-operation.

  5. Monitor the status of the CassandraDatacenter with this command:

    kubectl get cassandradatacenter dc1 -o yaml
    Sample results
    status:
      cassandraOperatorProgress: Updating
      conditions:
      - lastTransitionTime: "2022-10-19T20:24:40Z"
        message: ""
        reason: ""
        status: "True"
        type: Healthy
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "False"
        type: Stopped
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "False"
        type: ReplacingNodes
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "False"
        type: Updating
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "False"
        type: RollingRestart
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "False"
        type: Resuming
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "False"
        type: ScalingDown
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "True"
        type: Valid
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "True"
        type: Initialized
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "True"
        type: Ready
      - lastTransitionTime: "2022-10-19T21:24:34Z"
        message: ""
        reason: ""
        status: "True"
        type: ScalingUp
      lastServerNodeStarted: "2022-10-19T21:28:51Z"
      nodeStatuses:
        demo-dc1-rack1-sts-0:
          hostID: 2025d318-3fcc-4753-990b-3f9c388ba18a
        demo-dc1-rack1-sts-1:
          hostID: 33a0fc01-5947-471f-97a2-61237767d583
        demo-dc1-rack2-sts-0:
          hostID: 50748fb8-da1f-4add-b635-e80e282dc09b
        demo-dc1-rack2-sts-1:
          hostID: eb899ffd-0726-4fb4-bea7-c9d84d555339
        demo-dc1-rack3-sts-0:
          hostID: db86cba7-b014-40a2-b3f2-6eea21919a25
      observedGeneration: 1
      quietPeriod: "2022-10-19T20:24:47Z"
      superUserUpserted: "2022-10-19T20:24:42Z"
      usersUpserted: "2022-10-19T20:24:42Z"

    The ScalingUp condition has status: True indicating that the scaling up operation is in progress. Mission Control updates it to False when the operation is complete.

  6. If the results show a pod with Pending status, issue this command to get more details about the pod:

    kubectl describe pod <pod-name>
  7. The results may indicate a FailedScheduling event. To override the default of only one Cassandra pod per Kubernetes worker node, set the option allowMultipleNodesPerWorker: true in the Helm chart file. Apply this configuration update to the cluster.

  8. Run the following command to check the status of the CassandraDatacenter object. In the output look for a ScalingUp condition with its status set to True.

    kubectl get cassandradatacenter dc1 -o yaml
    Sample results
    ...
    status:
      cassandraOperatorProgress: Updating
      conditions:
      - lastTransitionTime: "2021-03-30T22:01:48Z"
        message: ""
        reason: ""
        status: "True"
        type: ScalingUp
    ...

    After the new nodes are deployed and running, Mission Control automatically runs nodetool cleanup only on the original nodes and not the new nodes. This removes keys and data that are no longer associated with those original nodes.

Upon completion of the cleanup operation, the ScalingUp condition status is set to False for each node.

What’s next

Run Cleanup operation to recover disk space from previously provisioned nodes.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com