Replace a database node
When an Apache Cassandra®, Hyper-Converged Database (HCD), or DataStax Enterprise (DSE) node becomes defective or problematic in your Mission Control cluster, you can replace it with a new, empty node.
You can replace individual nodes or entire racks using the replacenode command.
Mission Control does the following during the replacement process:
-
Detects and processes the
replacenodeCassandraTaskorK8ssandraTaskcustom resource (CR). -
Manages node replacements across the cluster or datacenter.
-
Controls the replacement process to maintain cluster stability.
-
Provides real-time task progress and status updates.
-
Stops the running node.
-
Deletes the Persistent Volume.
-
Removes the node.
-
Deploys a new replacement node.
-
Starts the new node with the same token range as the original node
|
The amount of data that the new nodes must rebuild determines how long the replacement process takes. During this time, the cluster remains operational but might experience increased load due to the data streaming process. |
Performance impact
The replacement creates a new, empty node with the same token range as the original. The new node rebuilds its data from remaining replicas, which creates temporary disk pressure during bootstrap. The disk pressure occurs on the replica nodes that are streaming data to the new node.
With a single rack configuration, the disk pressure occurs on up to num_tokens other nodes in the same rack.
For multiple racks, which is the recommended configuration, disk pressure occurs on up to num_tokens other nodes in the other racks.
Cassandra’s role in node replacement
When a new node comes online, Cassandra performs these critical operations:
-
Streams data from other nodes in the cluster to rebuild the new node’s data.
-
Verifies data consistency across all replicas.
-
Rebalances the token ranges if necessary.
-
Joins the node to the cluster only after data consistency is confirmed.
Cassandra must successfully rebuild and verify all data on the new node before considering the replacement process complete.
Replace a defective node
You can use the Mission Control UI or CLI to replace a defective node.
Use the UI to replace a defective node
-
In the Mission Control UI, select your project, and then select your target cluster.
-
In the Nodes section of the Overview tab, select the checkbox for your target node in its datacenter.
-
Click More Options for your target node, and then click Replace.
The replacement process starts immediately.
To monitor the replacement progress, see Monitor replace activity status.
Use the CLI to replace a defective node
-
Create or modify
replace-node-task.cassandratask.yaml:apiVersion: control.k8ssandra.io/v1alpha1 kind: CassandraTask metadata: name: replace-node-POD_NAME namespace: DATABASE_ID spec: concurrencyPolicy: Forbid maxConcurrentPods: 1 datacenter: name: DATACENTER_NAME namespace: DATABASE_ID jobs: - name: replace-node command: replacenode args: pod_name: CLUSTER_NAME-DC_NAME-sts-ORDINALReplace the following:
-
POD_NAME: The pod name -
DATABASE_ID: The database identifier -
DATACENTER_NAME: The datacenter name -
CLUSTER_NAME: The cluster name -
DC_NAME: The datacenter name -
ORDINAL: The pod’s ordinal number in the stateful setConfiguration options:
-
metadata.name: A unique identifier within the Kubernetes namespace -
metadata.name: Include the cluster name to prevent naming conflicts -
spec.concurrencyPolicy: Set toForbidto prevent concurrent task execution -
spec.datacenter: The target datacenter’s namespace and name -
spec.jobs[0].command: Must bereplacenode -
spec.jobs[0].args.maxConcurrentPods: Number of nodes to replace concurrently (available in Mission Control 1.18 and later)The
maxConcurrentPodsparameter is only available in Mission Control 1.18 and later. On earlier versions, the task replaces one node at a time, regardless of whether the field is set.
-
-
Apply the
replacenodeCassandraTaskCR to the data plane Kubernetes cluster:kubectl apply -f replace-node-task.cassandratask.yamlMission Control detects and manages the
CassandraTaskCR and performs the node replacement.
Replace multiple nodes in parallel
In Mission Control version 1.18.0 and later, you can replace multiple nodes from the same rack concurrently to reduce overall replacement time. By default, Mission Control replaces nodes sequentially, one node at a time.
Use parallel node replacements to:
-
Minimize downtime during large-scale node replacements.
-
Expedite recovery from multiple node failures in the same rack.
-
Reduce the total time required for infrastructure maintenance.
|
Replace nodes in parallel only when your cluster has a replication factor (RF) of three or higher. Parallel replacements temporarily reduce cluster availability within the affected rack. |
Use the maxConcurrentPods parameter in your K8ssandraTask manifest to replace multiple nodes concurrently:
apiVersion: control.k8ssandra.io/v1alpha1
kind: K8ssandraTask
metadata:
name: replace-multiple-nodes
namespace: DATABASE_ID
spec:
cluster:
name: CLUSTER_NAME
template:
maxConcurrentPods: 2
jobs:
- name: replace-nodes
command: replacenode
Replace the following:
-
DATABASE_ID: The database ID
-
CLUSTER_NAME: The cluster name
The maxConcurrentPods parameter controls how many nodes are replaced concurrently within the same rack.
Mission Control replaces the specified number of nodes from the same rack in parallel. Operations won’t run across racks concurrently.
Monitor cluster health and resource utilization during parallel replacements.
Adjust the maxConcurrentPods value based on your cluster’s capacity and network bandwidth.
Monitor replace activity status
You can use the Mission Control UI or CLI to monitor the replacement activity status.
Use the UI to monitor the status
-
In the Mission Control UI, select Activities in the main navigation.
-
View the Status notifications for the replacement progress.
A SUCCESS status indicates the replacement completed successfully.
The system displays timestamps for the operation’s start and end.
The Activities pane automatically refreshes to show current status.
Use the CLI to monitor the status
-
Monitor the task object’s progress in the control plane cluster.
- Use
CassandraTask -
kubectl get cassandratask replace-node -o yaml - Use
K8ssandraTask -
kubectl get k8ssandratask replace-multiple-nodes -o yamlResult... status: completionTime: "2024-11-01T03:28:33Z" conditions: - lastTransitionTime: "2024-11-01T03:28:12Z" status: "True" type: Running - lastTransitionTime: "2024-11-01T03:28:34Z" status: "False" type: Complete startTime: "2024-11-01T03:28:12Z" succeeded: 1Mission Control sets the
startTimefield before starting the replacement operation and updates thecompletionTimefield when the operation finishes.The status field behavior follows this pattern:
-
status: Initially set to"True". -
status: Changes to"False"when the replacement completes. -
type: Changes fromReplacingNodestoCompletewhen the replacement finishes.
-
- Use
Monitor detailed replacement progress
For detailed progress information during node replacement, you can view the status of individual pods in the CassandraTask:
kubectl get cassandratask replace-rack -o yaml
status:
conditions:
- lastTransitionTime: "2026-04-29T15:16:41Z"
message: ""
reason: Running
status: "True"
type: Running
podStatuses:
db-dc1-rack1-sts-0:
completionTime: "2026-04-28T15:05:56Z"
startTime: "2026-04-28T05:15:41Z"
status: COMPLETED
db-dc1-rack1-sts-1:
completionTime: "2026-04-29T00:31:16Z"
startTime: "2026-04-28T15:06:01Z"
status: COMPLETED
db-dc1-rack1-sts-2:
startTime: "2026-04-29T00:31:21Z"
status: RUNNING
startTime: "2026-04-28T05:15:41Z"
succeeded: 3
The podStatuses section shows:
-
Individual pod completion and start times.
-
Current status for each pod (
COMPLETEDorRUNNING). -
Overall progress through the
succeededcount.
Monitor data streaming progress
To monitor the data streaming progress on individual nodes during replacement, use the nodetool netstats command:
kubectl exec -it db-east-rack1-sts-0 -- nodetool netstats | \
awk '/\s+\/([0-9]{1,3}\.){3}[0-9]|Receiving/ {
if (NF == 1) host=$1;
else print host " : " $11/$4*100 "%\t" $11/1024/1024/1024 "/" $4/1024/1024/1024 "GB";
}' | sort -n
/10.0.0.1 : 0.0186096% 0.358676/1927.37GB
/10.0.0.2 : 0.00512277% 0.111914/2184.64GB
/10.0.0.3 : 0.01235903% 0.236621/1914.56GB
/10.0.0.4 : 0.0231946% 0.413935/1784.62GB
/10.0.0.5 : 0.0127195% 0.248485/1953.58GB
/10.0.0.6 : 0.0185481% 0.243199/1311.18GB
This output shows:
-
The source endpoint IP address.
-
Percentage of data streamed.
-
Amount of data streamed in GB.
-
Total data to be streamed in GB.
Each line represents a different source node streaming data to the replacement node.