Replace a database node
When an Apache Cassandra®, Hyper-Converged Database (HCD), or DataStax Enterprise (DSE) node becomes defective or problematic in your Mission Control cluster, you can replace it with a new, empty node.
Mission Control does the following during the replacement process:
-
Detects and processes the
replacenode
CassandraTask
Custom Resource (CR). -
Manages node replacements across the cluster or datacenter.
-
Controls the replacement process to maintain cluster stability.
-
Provides real-time task progress and status updates.
-
Stops the running node.
-
Deletes the Persistent Volume.
-
Removes the node.
-
Deploys a new replacement node.
-
Starts the new node with the same token range as the original node
The amount of data that the new nodes must rebuild determines how long the replacement process takes. During this time, the cluster remains operational but may experience increased load due to the data streaming process. |
Performance impact
The replacement creates a new, empty node with the same token range as the original. The new node rebuilds its data from remaining replicas, which creates temporary disk pressure during bootstrap. The disk pressure occurs on the replica nodes that are streaming data to the new node.
With a single rack configuration, the disk pressure occurs on up to num_tokens
other nodes in the same rack.
For multiple racks, which is the recommended configuration, disk pressure occurs on up to num_tokens
other nodes in the other racks.
Cassandra’s role in node replacement
When a new node comes online, Cassandra performs these critical operations:
-
Streams data from other nodes in the cluster to rebuild the new node’s data.
-
Verifies data consistency across all replicas.
-
Rebalances the token ranges if necessary.
-
Joins the node to the cluster only after data consistency is confirmed.
Cassandra must successfully rebuild and verify all data on the new node before considering the replacement process complete.
Prerequisites
-
A prepared environment on either bare-metal/VM or an existing Kubernetes cluster.
Replace a defective node
Choose either the UI or CLI method.
-
UI
-
CLI
-
In the Mission Control UI, select your project, and then select your target cluster.
-
In the Nodes section of the Overview tab, select the checkbox for your target node in its datacenter.
-
Click
More Options for your target node, and then click Replace.The replacement process starts immediately.
To monitor the replacement progress, see Monitor replace activity status.
-
Create or modify
replace-node-task.cassandratask.yaml
:
apiVersion: control.k8ssandra.io/v1alpha1
kind: CassandraTask
metadata:
name: replace-node-POD_NAME
namespace: DATABASE_ID
spec:
datacenter:
name: DATACENTER_NAME
namespace: DATABASE_ID
jobs:
- name: replace-node
command: replacenode
args:
pod_name: CLUSTER_NAME-DC_NAME-sts-ORDINAL
Replace the following:
-
POD_NAME
: The pod name -
DATABASE_ID
: The database identifier -
DATACENTER_NAME
: The datacenter name -
CLUSTER_NAME
: The cluster name -
DC_NAME
: The datacenter name -
ORDINAL
: The pod’s ordinal number in the stateful set
Configuration options:
-
metadata.name
: A unique identifier within the Kubernetes namespace -
metadata.name
: Include the cluster name to prevent naming conflicts -
spec.datacenter
: The target datacenter’s namespace and name -
spec.jobs[0].command
: Must bereplacenode
Apply the replacenode
CassandraTask
custom resource to the data plane Kubernetes cluster:
kubectl apply -f replace-node-task.cassandratask.yaml
Mission Control detects and manages the CassandraTask
CR and performs the node replacement.
Monitor replace activity status
Choose either the UI or CLI method.
-
UI
-
CLI
-
In the Mission Control UI, select Activities in the main navigation.
-
View the Status notifications for the replacement progress.
A SUCCESS status indicates the replacement completed successfully.
The system displays timestamps for the operation’s start and end.
The Activities pane automatically refreshes to show current status.
-
Monitor the
CassandraTask
object’s progress in the control plane cluster:kubectl get cassandratask replace-node -o yaml
Result
...
status:
completionTime: "2024-11-01T03:28:33Z"
conditions:
- lastTransitionTime: "2024-11-01T03:28:12Z"
status: "True"
type: Running
- lastTransitionTime: "2024-11-01T03:28:34Z"
status: "False"
type: Complete
startTime: "2024-11-01T03:28:12Z"
succeeded: 1
Mission Control sets the startTime
field before starting the replacement operation and updates the completionTime
field when the operation finishes.
The status field behavior follows this pattern:
-
status
: Initially set to"True"
. -
status
: Changes to"False"
when the replacement completes. -
type
: Changes fromReplacingNodes
toComplete
when the replacement finishes.