Replace a database node
When an Apache Cassandra®, Hyper-Converged Database (HCD), or DataStax Enterprise (DSE) node becomes defective or problematic in your Mission Control cluster, you can replace it with a new, empty node.
Mission Control does the following during the replacement process:
-
Detects and processes the
replacenodeCassandraTaskcustom resource (CR). -
Manages node replacements across the cluster or datacenter.
-
Controls the replacement process to maintain cluster stability.
-
Provides real-time task progress and status updates.
-
Stops the running node.
-
Deletes the Persistent Volume.
-
Removes the node.
-
Deploys a new replacement node.
-
Starts the new node with the same token range as the original node
|
The amount of data that the new nodes must rebuild determines how long the replacement process takes. During this time, the cluster remains operational but might experience increased load due to the data streaming process. |
Performance impact
The replacement creates a new, empty node with the same token range as the original. The new node rebuilds its data from remaining replicas, which creates temporary disk pressure during bootstrap. The disk pressure occurs on the replica nodes that are streaming data to the new node.
With a single rack configuration, the disk pressure occurs on up to num_tokens other nodes in the same rack.
For multiple racks, which is the recommended configuration, disk pressure occurs on up to num_tokens other nodes in the other racks.
Cassandra’s role in node replacement
When a new node comes online, Cassandra performs these critical operations:
-
Streams data from other nodes in the cluster to rebuild the new node’s data.
-
Verifies data consistency across all replicas.
-
Rebalances the token ranges if necessary.
-
Joins the node to the cluster only after data consistency is confirmed.
Cassandra must successfully rebuild and verify all data on the new node before considering the replacement process complete.
Replace a defective node
You can use the Mission Control UI or CLI to replace a defective node.
Use the UI to replace a defective node
-
In the Mission Control UI, select your project, and then select your target cluster.
-
In the Nodes section of the Overview tab, select the checkbox for your target node in its datacenter.
-
Click More Options for your target node, and then click Replace.
The replacement process starts immediately.
To monitor the replacement progress, see Monitor replace activity status.
Use the CLI to replace a defective node
-
Create or modify
replace-node-task.cassandratask.yaml:apiVersion: control.k8ssandra.io/v1alpha1 kind: CassandraTask metadata: name: replace-node-POD_NAME namespace: DATABASE_ID spec: datacenter: name: DATACENTER_NAME namespace: DATABASE_ID jobs: - name: replace-node command: replacenode args: pod_name: CLUSTER_NAME-DC_NAME-sts-ORDINALReplace the following:
-
POD_NAME: The pod name -
DATABASE_ID: The database identifier -
DATACENTER_NAME: The datacenter name -
CLUSTER_NAME: The cluster name -
DC_NAME: The datacenter name -
ORDINAL: The pod’s ordinal number in the stateful setConfiguration options:
-
metadata.name: A unique identifier within the Kubernetes namespace -
metadata.name: Include the cluster name to prevent naming conflicts -
spec.datacenter: The target datacenter’s namespace and name -
spec.jobs[0].command: Must bereplacenode
-
-
Apply the
replacenodeCassandraTaskCR to the data plane Kubernetes cluster:kubectl apply -f replace-node-task.cassandratask.yamlMission Control detects and manages the
CassandraTaskCR and performs the node replacement.
Monitor replace activity status
You can use the Mission Control UI or CLI to monitor the replacement activity status.
Use the UI to monitor the status
-
In the Mission Control UI, select Activities in the main navigation.
-
View the Status notifications for the replacement progress.
A SUCCESS status indicates the replacement completed successfully.
The system displays timestamps for the operation’s start and end.
The Activities pane automatically refreshes to show current status.
Use the CLI to monitor the status
-
Monitor the
CassandraTaskobject’s progress in the control plane cluster:kubectl get cassandratask replace-node -o yamlResult... status: completionTime: "2024-11-01T03:28:33Z" conditions: - lastTransitionTime: "2024-11-01T03:28:12Z" status: "True" type: Running - lastTransitionTime: "2024-11-01T03:28:34Z" status: "False" type: Complete startTime: "2024-11-01T03:28:12Z" succeeded: 1Mission Control sets the
startTimefield before starting the replacement operation and updates thecompletionTimefield when the operation finishes.The status field behavior follows this pattern:
-
status: Initially set to"True". -
status: Changes to"False"when the replacement completes. -
type: Changes fromReplacingNodestoCompletewhen the replacement finishes.
-