Replace a database node

When an Apache Cassandra®, Hyper-Converged Database (HCD), or DataStax Enterprise (DSE) node becomes defective or problematic in your Mission Control cluster, you can replace it with a new, empty node. You can replace individual nodes or entire racks using the replacenode command.

Mission Control does the following during the replacement process:

  1. Detects and processes the replacenode CassandraTask or K8ssandraTask custom resource (CR).

  2. Manages node replacements across the cluster or datacenter.

  3. Controls the replacement process to maintain cluster stability.

  4. Provides real-time task progress and status updates.

  5. Stops the running node.

  6. Deletes the Persistent Volume.

  7. Removes the node.

  8. Deploys a new replacement node.

  9. Starts the new node with the same token range as the original node

The amount of data that the new nodes must rebuild determines how long the replacement process takes. During this time, the cluster remains operational but might experience increased load due to the data streaming process.

Performance impact

The replacement creates a new, empty node with the same token range as the original. The new node rebuilds its data from remaining replicas, which creates temporary disk pressure during bootstrap. The disk pressure occurs on the replica nodes that are streaming data to the new node.

With a single rack configuration, the disk pressure occurs on up to num_tokens other nodes in the same rack. For multiple racks, which is the recommended configuration, disk pressure occurs on up to num_tokens other nodes in the other racks.

Cassandra’s role in node replacement

When a new node comes online, Cassandra performs these critical operations:

  • Streams data from other nodes in the cluster to rebuild the new node’s data.

  • Verifies data consistency across all replicas.

  • Rebalances the token ranges if necessary.

  • Joins the node to the cluster only after data consistency is confirmed.

Cassandra must successfully rebuild and verify all data on the new node before considering the replacement process complete.

Replace a defective node

You can use the Mission Control UI or CLI to replace a defective node.

Use the UI to replace a defective node

  1. In the Mission Control UI, select your project, and then select your target cluster.

  2. In the Nodes section of the Overview tab, select the checkbox for your target node in its datacenter.

  3. Click more_vert More Options for your target node, and then click Replace.

    The replacement process starts immediately.

    To monitor the replacement progress, see Monitor replace activity status.

Use the CLI to replace a defective node

  1. Open the Mission Control CLI

  2. Create or modify replace-node-task.cassandratask.yaml:

    apiVersion: control.k8ssandra.io/v1alpha1
    kind: CassandraTask
    metadata:
      name: replace-node-POD_NAME
      namespace: DATABASE_ID
    spec:
      concurrencyPolicy: Forbid
      maxConcurrentPods: 1
      datacenter:
        name: DATACENTER_NAME
        namespace: DATABASE_ID
      jobs:
        - name: replace-node
          command: replacenode
          args:
            pod_name: CLUSTER_NAME-DC_NAME-sts-ORDINAL

    Replace the following:

    • POD_NAME: The pod name

    • DATABASE_ID: The database identifier

    • DATACENTER_NAME: The datacenter name

    • CLUSTER_NAME: The cluster name

    • DC_NAME: The datacenter name

    • ORDINAL: The pod’s ordinal number in the stateful set

      Configuration options:

    • metadata.name: A unique identifier within the Kubernetes namespace

    • metadata.name: Include the cluster name to prevent naming conflicts

    • spec.concurrencyPolicy: Set to Forbid to prevent concurrent task execution

    • spec.datacenter: The target datacenter’s namespace and name

    • spec.jobs[0].command: Must be replacenode

    • spec.jobs[0].args.maxConcurrentPods: Number of nodes to replace concurrently (available in Mission Control 1.18 and later)

      The maxConcurrentPods parameter is only available in Mission Control 1.18 and later. On earlier versions, the task replaces one node at a time, regardless of whether the field is set.

  3. Apply the replacenode CassandraTask CR to the data plane Kubernetes cluster:

    kubectl apply -f replace-node-task.cassandratask.yaml

    Mission Control detects and manages the CassandraTask CR and performs the node replacement.

Replace multiple nodes in parallel

In Mission Control version 1.18.0 and later, you can replace multiple nodes from the same rack concurrently to reduce overall replacement time. By default, Mission Control replaces nodes sequentially, one node at a time.

Use parallel node replacements to:

  • Minimize downtime during large-scale node replacements.

  • Expedite recovery from multiple node failures in the same rack.

  • Reduce the total time required for infrastructure maintenance.

Replace nodes in parallel only when your cluster has a replication factor (RF) of three or higher. Parallel replacements temporarily reduce cluster availability within the affected rack.

Use the maxConcurrentPods parameter in your K8ssandraTask manifest to replace multiple nodes concurrently:

apiVersion: control.k8ssandra.io/v1alpha1
kind: K8ssandraTask
metadata:
  name: replace-multiple-nodes
  namespace: DATABASE_ID
spec:
  cluster:
    name: CLUSTER_NAME
  template:
    maxConcurrentPods: 2
    jobs:
      - name: replace-nodes
        command: replacenode

Replace the following:

  • DATABASE_ID: The database ID

  • CLUSTER_NAME: The cluster name

The maxConcurrentPods parameter controls how many nodes are replaced concurrently within the same rack. Mission Control replaces the specified number of nodes from the same rack in parallel. Operations won’t run across racks concurrently.

Monitor cluster health and resource utilization during parallel replacements. Adjust the maxConcurrentPods value based on your cluster’s capacity and network bandwidth.

Monitor replace activity status

You can use the Mission Control UI or CLI to monitor the replacement activity status.

Use the UI to monitor the status

  1. In the Mission Control UI, select Activities in the main navigation.

  2. View the Status notifications for the replacement progress.

    A SUCCESS status indicates the replacement completed successfully.

    The system displays timestamps for the operation’s start and end.

    The Activities pane automatically refreshes to show current status.

Use the CLI to monitor the status

  1. Open the Mission Control CLI.

  2. Monitor the task object’s progress in the control plane cluster.

    Use CassandraTask
    kubectl get cassandratask replace-node -o yaml
    Use K8ssandraTask
    kubectl get k8ssandratask replace-multiple-nodes -o yaml
    Result
    ...
    status:
      completionTime: "2024-11-01T03:28:33Z"
      conditions:
      - lastTransitionTime: "2024-11-01T03:28:12Z"
        status: "True"
        type: Running
      - lastTransitionTime: "2024-11-01T03:28:34Z"
        status: "False"
        type: Complete
      startTime: "2024-11-01T03:28:12Z"
      succeeded: 1

    Mission Control sets the startTime field before starting the replacement operation and updates the completionTime field when the operation finishes.

    The status field behavior follows this pattern:

    • status: Initially set to "True".

    • status: Changes to "False" when the replacement completes.

    • type: Changes from ReplacingNodes to Complete when the replacement finishes.

Monitor detailed replacement progress

For detailed progress information during node replacement, you can view the status of individual pods in the CassandraTask:

kubectl get cassandratask replace-rack -o yaml
Result
status:
  conditions:
  - lastTransitionTime: "2026-04-29T15:16:41Z"
    message: ""
    reason: Running
    status: "True"
    type: Running
  podStatuses:
    db-dc1-rack1-sts-0:
      completionTime: "2026-04-28T15:05:56Z"
      startTime: "2026-04-28T05:15:41Z"
      status: COMPLETED
    db-dc1-rack1-sts-1:
      completionTime: "2026-04-29T00:31:16Z"
      startTime: "2026-04-28T15:06:01Z"
      status: COMPLETED
    db-dc1-rack1-sts-2:
      startTime: "2026-04-29T00:31:21Z"
      status: RUNNING
  startTime: "2026-04-28T05:15:41Z"
  succeeded: 3

The podStatuses section shows:

  • Individual pod completion and start times.

  • Current status for each pod (COMPLETED or RUNNING).

  • Overall progress through the succeeded count.

Monitor data streaming progress

To monitor the data streaming progress on individual nodes during replacement, use the nodetool netstats command:

kubectl exec -it db-east-rack1-sts-0 -- nodetool netstats | \
  awk '/\s+\/([0-9]{1,3}\.){3}[0-9]|Receiving/ {
    if (NF == 1) host=$1;
    else print host " : " $11/$4*100 "%\t" $11/1024/1024/1024 "/" $4/1024/1024/1024 "GB";
  }' | sort -n
Result
/10.0.0.1 : 0.0186096% 0.358676/1927.37GB
/10.0.0.2 : 0.00512277% 0.111914/2184.64GB
/10.0.0.3 : 0.01235903% 0.236621/1914.56GB
/10.0.0.4 : 0.0231946% 0.413935/1784.62GB
/10.0.0.5 : 0.0127195% 0.248485/1953.58GB
/10.0.0.6 : 0.0185481% 0.243199/1311.18GB

This output shows:

  • The source endpoint IP address.

  • Percentage of data streamed.

  • Amount of data streamed in GB.

  • Total data to be streamed in GB.

Each line represents a different source node streaming data to the replacement node.

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2026 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM