Replace a database node

When an Apache Cassandra®, Hyper-Converged Database (HCD), or DataStax Enterprise (DSE) node becomes defective or problematic in your Mission Control cluster, you can replace it with a new, empty node.

Mission Control does the following during the replacement process:

Detects and processes the replacenode CassandraTask Custom Resource (CR).
Manages node replacements across the cluster or datacenter.
Controls the replacement process to maintain cluster stability.
Provides real-time task progress and status updates.
Stops the running node.
Deletes the Persistent Volume.
Removes the node.
Deploys a new replacement node.
Starts the new node with the same token range as the original node

The amount of data that the new nodes must rebuild determines how long the replacement process takes. During this time, the cluster remains operational but may experience increased load due to the data streaming process.

Performance impact

The replacement creates a new, empty node with the same token range as the original. The new node rebuilds its data from remaining replicas, which creates temporary disk pressure during bootstrap. The disk pressure occurs on the replica nodes that are streaming data to the new node.

With a single rack configuration, the disk pressure occurs on up to num_tokens other nodes in the same rack. For multiple racks, which is the recommended configuration, disk pressure occurs on up to num_tokens other nodes in the other racks.

Cassandra’s role in node replacement

When a new node comes online, Cassandra performs these critical operations:

Streams data from other nodes in the cluster to rebuild the new node’s data.
Verifies data consistency across all replicas.
Rebalances the token ranges if necessary.
Joins the node to the cluster only after data consistency is confirmed.

Cassandra must successfully rebuild and verify all data on the new node before considering the replacement process complete.

Prerequisites

A prepared environment on either bare-metal/VM or an existing Kubernetes cluster.

Replace a defective node

Choose either the UI or CLI method.

In the Mission Control UI, select your project, and then select your target cluster.
In the Nodes section of the Overview tab, select the checkbox for your target node in its datacenter.
Click more_vert More Options for your target node, and then click Replace.

The replacement process starts immediately.

To monitor the replacement progress, see Monitor replace activity status.

Open the Mission Control CLI
Create or modify replace-node-task.cassandratask.yaml:

apiVersion: control.k8ssandra.io/v1alpha1
kind: CassandraTask
metadata:
  name: replace-node-POD_NAME
  namespace: DATABASE_ID
spec:
  datacenter:
    name: DATACENTER_NAME
    namespace: DATABASE_ID
  jobs:
    - name: replace-node
      command: replacenode
      args:
        pod_name: CLUSTER_NAME-DC_NAME-sts-ORDINAL

Replace the following:

POD_NAME: The pod name
DATABASE_ID: The database identifier
DATACENTER_NAME: The datacenter name
CLUSTER_NAME: The cluster name
DC_NAME: The datacenter name
ORDINAL: The pod’s ordinal number in the stateful set

Configuration options:

metadata.name: A unique identifier within the Kubernetes namespace
metadata.name: Include the cluster name to prevent naming conflicts
spec.datacenter: The target datacenter’s namespace and name
spec.jobs[0].command: Must be replacenode

Apply the replacenode CassandraTask custom resource to the data plane Kubernetes cluster:

kubectl apply -f replace-node-task.cassandratask.yaml

Mission Control detects and manages the CassandraTask CR and performs the node replacement.

Monitor replace activity status

Choose either the UI or CLI method.

In the Mission Control UI, select Activities in the main navigation.
View the Status notifications for the replacement progress.

A SUCCESS status indicates the replacement completed successfully.

The system displays timestamps for the operation’s start and end.

The Activities pane automatically refreshes to show current status.

Open the Mission Control CLI.
Monitor the CassandraTask object’s progress in the control plane cluster:
```
kubectl get cassandratask replace-node -o yaml
```

Result

...
status:
  completionTime: "2024-11-01T03:28:33Z"
  conditions:
  - lastTransitionTime: "2024-11-01T03:28:12Z"
    status: "True"
    type: Running
  - lastTransitionTime: "2024-11-01T03:28:34Z"
    status: "False"
    type: Complete
  startTime: "2024-11-01T03:28:12Z"
  succeeded: 1

Mission Control sets the startTime field before starting the replacement operation and updates the completionTime field when the operation finishes.

The status field behavior follows this pattern:

status: Initially set to "True".
status: Changes to "False" when the replacement completes.
type: Changes from ReplacingNodes to Complete when the replacement finishes.