Add nodes to a cluster

Adding nodes increases the capacity to service customer queries against the data.

This task information focuses on adding nodes to a single existing datacenter.

To scale-up the number of datacenters, follow the add a datacenter task.

Prerequisites

A prepared environment on either bare-metal/VM or an existing Kubernetes cluster.

Workflow of user and operators

User submits modified datacenter size parameter in MissionControlCluster to the control plane Kubernetes cluster.
Cluster-level operator detects dc-level change in the cluster object and modifies dc-level resources.
DC-level operator detects size change in dc-level resource and provisions Kubernetes resources representing the new nodes.

DC-level operator bootstraps database nodes on new pods.

When commissioning nodes, Mission Control:

targets the rack with the lowest number of active nodes.
uses a bootstrap (self-starting process) that adds nodes without external input.
commissions multiple nodes in a single rack only after adjusting other racks in the datacenter to reflect the desired node count.

identifies the number of nodes being added.

Limitations - You must increase the datacenter size by a multiple of the number of racks in the target datacenter. For example, with 3 racks you can scale up by 3, 6, or 9 nodes, and so on. Invalid size parameters are ignored.

Add nodes to a datacenter in a cluster

Ensure that all database pods can route to each other. This is a critical requirement for proper operation and data consistency.

The requirement applies to:

All database pods within the same region or availability zone
All database pods across different availability zones within the same region
All database pods across different regions for multi-region deployments
All database pods across different racks in the same datacenter for multi-region deployments

To achieve this:

Configure proper network policies and firewall rules to allow worker to worker and pod-to-pod communication.
Set up routing tables, container network interfaces, or peering networks.
Implement network overlay solutions like Submariner or Cilium
Verify that the network infrastructure supports the required pod-to-pod connectivity.

Failure to establish proper pod-to-pod routing results in:

Connectivity issues between database pods
Cluster instability
Data consistency issues
Failed replication

You start with an existing Kubernetes cluster with one datacenter with three nodes distributed equally across three racks. The goal is to modify the MissionControlCluster manifest (object) specification and submit that change with the kubectl command to to add one or more nodes to a datacenter in a Kubernetes cluster.

Here is a sample MissionControlCluster manifest named demo.missioncontrolcluster.yaml that was used to initially create the datacenter (dc1):

apiVersion: missioncontrol.datastax.com/v1beta2
kind: MissionControlCluster
metadata:
  name: demo
spec:
  k8ssandra:
    cassandra:
      serverVersion: 6.8.26
      serverType: dse
      storageConfig:
        cassandraDataVolumeClaimSpec:
          storageClassName: premium-rwo
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 5Gi
      datacenters:
        - metadata:
            name: dc1
          k8sContext: east
          size: 3
          racks:
            - name: rack1
              nodeAffinityLabels:
                topology.kubernetes.io/zone: us-east1-c
            - name: rack2
              nodeAffinityLabels:
                topology.kubernetes.io/zone: us-east1-b
            - name: rack3
              nodeAffinityLabels:
                topology.kubernetes.io/zone: us-east1-d

Modify the datacenters.size specification from 3 - (1 node per rack) to 6 - (3 nodes per rack):

apiVersion: missioncontrol.datastax.com/v1beta2
kind: MissionControlCluster
metadata:
  name: demo
spec:
  ...
      datacenters:
        - metadata:
            name: dc1
          k8sContext: east
          size: 6
          racks:
...

Submit this change in the control plane cluster:

kubectl apply -f demo.cassandratask.yaml

Three additional nodes (pods) deploy in parallel as the MissionControlCluster object increases in size from three to six nodes. Each node, however, starts serially as specified by the order of the rack definitions.

At any given time the number of started nodes in a rack cannot be more or less than the number of started nodes in all other racks by more than one. By default, Mission Control configures the database pods so that Kubernetes is blocked from scheduling multiple pods on the same worker node. An attempt to increase the cluster size beyond the number of available worker nodes might result in the non-deployment of additional pods.

Monitor the status of the nodes being created:

kubectl get pods -l "cassandra.datastax.com/cluster"=demo

Result

NAME                   READY   STATUS    RESTARTS   AGE
demo-dc1-rack1-sts-0   2/2     Running   0          67m
demo-dc1-rack1-sts-1   1/2     Running   0          110s
demo-dc1-rack2-sts-0   2/2     Running   0          67m
demo-dc1-rack2-sts-1   1/2     Running   0          110s
demo-dc1-rack3-sts-0   2/2     Running   0          67m
demo-dc1-rack3-sts-1   1/2     Running   0          110s

The -l flag adds a label selector to filter the results. Every database pod has the cassandra.datastax.com/cluster label. There are six pods but only the initial three are fully ready. This is expected as the results were captured in mid-operation.

Monitor the status of the CassandraDatacenter with this command:

kubectl get cassandradatacenter dc1 -o yaml

Result

status:
  cassandraOperatorProgress: Updating
  conditions:
  - lastTransitionTime: "2022-10-19T20:24:40Z"
    message: ""
    reason: ""
    status: "True"
    type: Healthy
  - lastTransitionTime: "2022-10-19T20:24:41Z"
    message: ""
    reason: ""
    status: "False"
    type: Stopped
  - lastTransitionTime: "2022-10-19T20:24:41Z"
    message: ""
    reason: ""
    status: "False"
    type: ReplacingNodes
  - lastTransitionTime: "2022-10-19T20:24:41Z"
    message: ""
    reason: ""
    status: "False"
    type: Updating
  - lastTransitionTime: "2022-10-19T20:24:41Z"
    message: ""
    reason: ""
    status: "False"
    type: RollingRestart
  - lastTransitionTime: "2022-10-19T20:24:41Z"
    message: ""
    reason: ""
    status: "False"
    type: Resuming
  - lastTransitionTime: "2022-10-19T20:24:41Z"
    message: ""
    reason: ""
    status: "False"
    type: ScalingDown
  - lastTransitionTime: "2022-10-19T20:24:41Z"
    message: ""
    reason: ""
    status: "True"
    type: Valid
  - lastTransitionTime: "2022-10-19T20:24:41Z"
    message: ""
    reason: ""
    status: "True"
    type: Initialized
  - lastTransitionTime: "2022-10-19T20:24:41Z"
    message: ""
    reason: ""
    status: "True"
    type: Ready
  - lastTransitionTime: "2022-10-19T21:24:34Z"
    message: ""
    reason: ""
    status: "True"
    type: ScalingUp
  lastServerNodeStarted: "2022-10-19T21:28:51Z"
  nodeStatuses:
    demo-dc1-rack1-sts-0:
      hostID: 2025d318-3fcc-4753-990b-3f9c388ba18a
    demo-dc1-rack1-sts-1:
      hostID: 33a0fc01-5947-471f-97a2-61237767d583
    demo-dc1-rack2-sts-0:
      hostID: 50748fb8-da1f-4add-b635-e80e282dc09b
    demo-dc1-rack2-sts-1:
      hostID: eb899ffd-0726-4fb4-bea7-c9d84d555339
    demo-dc1-rack3-sts-0:
      hostID: db86cba7-b014-40a2-b3f2-6eea21919a25
  observedGeneration: 1
  quietPeriod: "2022-10-19T20:24:47Z"
  superUserUpserted: "2022-10-19T20:24:42Z"
  usersUpserted: "2022-10-19T20:24:42Z"

The ScalingUp condition has status: True indicating that the scaling up operation is in progress. Mission Control updates it to False when the operation is complete.

If the results show a pod with Pending status, issue this command to get more details about the pod:
```
kubectl describe pod POD_NAME
```
Replace POD_NAME with the name of the pod that is in the Pending status.
The results might indicate a FailedScheduling event. This might occur when there are not enough infrastructure resources available.
Run the following command to check the status of the CassandraDatacenter object. In the output look for a ScalingUp condition with its status set to True.
```
kubectl get cassandradatacenter cluster-name-dc-name -o yaml
```
Result
... status: cassandraOperatorProgress: Updating conditions: - lastTransitionTime: "2021-03-30T22:01:48Z" message: "" reason: "" status: "True" type: ScalingUp ...
After the new nodes are deployed and running, Mission Control automatically runs nodetool cleanup only on the original nodes and not the new nodes. This removes keys and data that are no longer associated with those original nodes.

Upon completion of the cleanup operation, the ScalingUp condition status is set to False for each node.

Next steps

Run Cleanup operation to recover disk space from previously provisioned nodes.

Add nodes to a cluster

Prerequisites

Workflow of user and operators

Add nodes to a datacenter in a cluster

Next steps

Was this helpful?

Give Feedback