Add nodes to a cluster

Adding nodes increases the capacity to service customer queries against the data.

This task information focuses on adding nodes to a single existing datacenter.

To scale-up the number of datacenters, follow the add a datacenter task.

Prerequisites

Workflow of user and operators

  1. User submits modified datacenter size parameter in MissionControlCluster to the control plane Kubernetes cluster.

  2. Cluster-level operator detects dc-level change in the cluster object and modifies dc-level resources.

  3. DC-level operator detects size change in dc-level resource and provisions Kubernetes resources representing the new nodes.

  4. DC-level operator bootstraps database nodes on new pods.

    When commissioning nodes, Mission Control:

    • targets the rack with the lowest number of active nodes.

    • uses a bootstrap (self-starting process) that adds nodes without external input.

    • commissions multiple nodes in a single rack only after adjusting other racks in the datacenter to reflect the desired node count.

    • identifies the number of nodes being added.

      Limitations - You must increase the datacenter size by a multiple of the number of racks in the target datacenter. For example, with 3 racks you can scale up by 3, 6, or 9 nodes, and so on. Invalid size parameters are ignored.

Configure pod-to-pod routing

Ensure that all database pods can route to each other. This is a critical requirement for proper operation and data consistency.

The requirement applies to:

  • All database pods within the same region or availability zone.

  • All database pods across different availability zones within the same region.

  • All database pods across different regions for multi-region deployments.

  • All database pods across different racks in the same datacenter for multi-region deployments.

The way you configure pod-to-pod routing depends on your cluster architecture:

Single-cluster deployments

The cluster’s Container Network Interface (CNI) typically provides pod-to-pod network connectivity for database pods within a single Kubernetes cluster. You usually need no additional configuration beyond standard Kubernetes networking.

Security considerations for shared clusters

If your database cluster shares a Kubernetes cluster with other applications, implement security controls to prevent unauthorized access to database internode ports (7000/7001):

  • NetworkPolicy isolation (required): Use Kubernetes NetworkPolicy to restrict access to internode ports to only authorized database pods. NetworkPolicy prevents other applications in the cluster from accessing these ports even if underlying firewall rules are broad.

  • Internode TLS encryption (required): Enable internode TLS to protect data in transit and prevent unauthorized nodes from joining the cluster.

  • Dedicated node pools (recommended): Consider dedicated node pools or subnets for database workloads to enable more granular firewall controls at the infrastructure level.

Multi-cluster deployments

For database pods that span multiple Kubernetes clusters, NetworkPolicy alone doesn’t provide sufficient connectivity. You must establish Layer 3 network connectivity or overlay connectivity between the database pod networks (pod CIDRs or node subnets depending on your deployment). Kubernetes NetworkPolicy operates only within a single cluster boundary and can’t provide cross-cluster connectivity.

  1. Choose one of the following approaches to establish pod network connectivity across clusters:

    • Routed pod CIDRs (recommended): Use cloud provider native routing solutions when your platform supports them. This approach provides the best performance and simplest operational model.

      • AWS: VPC Peering, Transit Gateway, or AWS Cloud WAN.

      • Azure: VNet Peering or Virtual WAN.

      • GCP: VPC Peering or Cloud VPN.

    • Submariner: Open-source multi-cluster connectivity solution, common in OpenShift multi-cluster deployments. Submariner provides encrypted tunnels between clusters. For more information, see the Submariner documentation.

    • Cilium Cluster Mesh: For clusters that use Cilium CNI. For more information, see the Cilium documentation.

      • Cilium Cluster Mesh provides native multi-cluster networking with Cilium.

      • Cilium Cluster Mesh enables pod-to-pod connectivity across clusters.

  2. After you establish cross-cluster connectivity, implement the following security measures. Traditional firewall rules alone lack application awareness and can’t distinguish between different pods or services within a cluster. Use Kubernetes NetworkPolicy for pod-level access control within clusters, and combine it with network-level firewalls for defense in depth.

    • Enable internode TLS encryption to protect data in transit between clusters.

    • Configure firewall rules at the network level to restrict traffic between cluster pod CIDRs.

    • Use NetworkPolicy within each cluster to further restrict access to database ports.

    • Consider using dedicated subnets or VPCs for database clusters to enable network-level isolation.

To verify that pod-to-pod routing has been configured properly, do the following:

  1. Test connectivity between database pods using nodetool status or cqlsh.

  2. Check that all nodes can see each other in the cluster topology.

  3. Monitor for connection errors or timeouts in database logs.

  4. Verify that gossip protocol communication functions correctly.

If pod-to-pod routing isn’t implemented correctly, you might experience the following:

  • Connectivity issues between database pods.

  • Cluster instability.

  • Data consistency issues.

  • Failed replication.

  • Incomplete or failed cluster operations.

Add nodes to a datacenter in a cluster

You start with an existing Kubernetes cluster with one datacenter with three nodes distributed equally across three racks. The goal is to modify the MissionControlCluster manifest (object) specification and submit that change with the kubectl command to to add one or more nodes to a datacenter in a Kubernetes cluster.

  1. Configure and verify pod-to-pod routing.

  2. Here is a sample MissionControlCluster manifest named demo.missioncontrolcluster.yaml that was used to initially create the datacenter (dc1):

    apiVersion: missioncontrol.datastax.com/v1beta2
    kind: MissionControlCluster
    metadata:
      name: demo
    spec:
      k8ssandra:
        cassandra:
          serverVersion: 6.8.26
          serverType: dse
          storageConfig:
            cassandraDataVolumeClaimSpec:
              storageClassName: premium-rwo
              accessModes:
                - ReadWriteOnce
              resources:
                requests:
                  storage: 5Gi
          datacenters:
            - metadata:
                name: dc1
              k8sContext: east
              size: 3
              racks:
                - name: rack1
                  nodeAffinityLabels:
                    topology.kubernetes.io/zone: us-east1-c
                - name: rack2
                  nodeAffinityLabels:
                    topology.kubernetes.io/zone: us-east1-b
                - name: rack3
                  nodeAffinityLabels:
                    topology.kubernetes.io/zone: us-east1-d
  3. Modify the datacenters.size specification from 3 - (1 node per rack) to 6 - (3 nodes per rack):

    apiVersion: missioncontrol.datastax.com/v1beta2
    kind: MissionControlCluster
    metadata:
      name: demo
    spec:
      ...
          datacenters:
            - metadata:
                name: dc1
              k8sContext: east
              size: 6
              racks:
    ...
  4. Submit this change in the control plane cluster:

    kubectl apply -f demo.cassandratask.yaml

    Three additional nodes (pods) deploy in parallel as the MissionControlCluster object increases in size from three to six nodes. Each node, however, starts serially as specified by the order of the rack definitions.

    At any given time the number of started nodes in a rack cannot be more or less than the number of started nodes in all other racks by more than one. By default, Mission Control configures the database pods so that Kubernetes is blocked from scheduling multiple pods on the same worker node. An attempt to increase the cluster size beyond the number of available worker nodes might result in the non-deployment of additional pods.

  5. Monitor the status of the nodes being created:

    kubectl get pods -l "cassandra.datastax.com/cluster"=demo
    Result
    NAME                   READY   STATUS    RESTARTS   AGE
    demo-dc1-rack1-sts-0   2/2     Running   0          67m
    demo-dc1-rack1-sts-1   1/2     Running   0          110s
    demo-dc1-rack2-sts-0   2/2     Running   0          67m
    demo-dc1-rack2-sts-1   1/2     Running   0          110s
    demo-dc1-rack3-sts-0   2/2     Running   0          67m
    demo-dc1-rack3-sts-1   1/2     Running   0          110s

    The -l flag adds a label selector to filter the results. Every database pod has the cassandra.datastax.com/cluster label. There are six pods but only the initial three are fully ready. This is expected as the results were captured in mid-operation.

  6. Monitor the status of the CassandraDatacenter with this command:

    kubectl get cassandradatacenter dc1 -o yaml
    Result
    status:
      cassandraOperatorProgress: Updating
      conditions:
      - lastTransitionTime: "2022-10-19T20:24:40Z"
        message: ""
        reason: ""
        status: "True"
        type: Healthy
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "False"
        type: Stopped
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "False"
        type: ReplacingNodes
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "False"
        type: Updating
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "False"
        type: RollingRestart
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "False"
        type: Resuming
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "False"
        type: ScalingDown
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "True"
        type: Valid
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "True"
        type: Initialized
      - lastTransitionTime: "2022-10-19T20:24:41Z"
        message: ""
        reason: ""
        status: "True"
        type: Ready
      - lastTransitionTime: "2022-10-19T21:24:34Z"
        message: ""
        reason: ""
        status: "True"
        type: ScalingUp
      lastServerNodeStarted: "2022-10-19T21:28:51Z"
      nodeStatuses:
        demo-dc1-rack1-sts-0:
          hostID: 2025d318-3fcc-4753-990b-3f9c388ba18a
        demo-dc1-rack1-sts-1:
          hostID: 33a0fc01-5947-471f-97a2-61237767d583
        demo-dc1-rack2-sts-0:
          hostID: 50748fb8-da1f-4add-b635-e80e282dc09b
        demo-dc1-rack2-sts-1:
          hostID: eb899ffd-0726-4fb4-bea7-c9d84d555339
        demo-dc1-rack3-sts-0:
          hostID: db86cba7-b014-40a2-b3f2-6eea21919a25
      observedGeneration: 1
      quietPeriod: "2022-10-19T20:24:47Z"
      superUserUpserted: "2022-10-19T20:24:42Z"
      usersUpserted: "2022-10-19T20:24:42Z"

    The ScalingUp condition has status: True indicating that the scaling up operation is in progress. Mission Control updates it to False when the operation is complete.

  7. If the results show a pod with Pending status, issue this command to get more details about the pod:

    kubectl describe pod POD_NAME

    Replace POD_NAME with the name of the pod that is in the Pending status.

  8. The results might indicate a FailedScheduling event. This might occur when there are not enough infrastructure resources available.

  9. Run the following command to check the status of the CassandraDatacenter object. In the output look for a ScalingUp condition with its status set to True.

    kubectl get cassandradatacenter cluster-name-dc-name -o yaml
    Result
    ...
    status:
      cassandraOperatorProgress: Updating
      conditions:
      - lastTransitionTime: "2021-03-30T22:01:48Z"
        message: ""
        reason: ""
        status: "True"
        type: ScalingUp
    ...

    After the new nodes are deployed and running, Mission Control automatically runs nodetool cleanup only on the original nodes and not the new nodes. This removes keys and data that are no longer associated with those original nodes.

Upon completion of the cleanup operation, the ScalingUp condition status is set to False for each node.

Next steps

Run Cleanup operation to recover disk space from previously provisioned nodes.

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2026 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM