Replace a node
Replacing a node destroys it and its data, forcing a replacement node that is clean and empty.
Run this operation when a node is defective and you need to create a new node that is identical to the node being replaced.
Mission Control detects the replacenode
CassandraTask
custom resource definition (CRD), iterates one rack at a time, and triggers and monitors replacement operations one pod at a time. Mission Control reports task progress and status.
Performance impact
This operation results in the complete replacement of a node with a new and empty node. The new node contains no data, but it retains the same token range as the node it is replacing. In this situation the new node bootstraps rebuilding its data from the remaining replicas within the cluster. This results in some disk pressure while the replacement node bootstraps.
Prerequisites
-
A prepared environment on either bare-metal/VM or an existing Kubernetes cluster.
Replace a defective node
Choose User Interface (UI) or Command Line Interface (CLI) steps.
-
UI
-
CLI
-
In the Home Clusters dialog, click the target cluster namespace.
-
In the Nodes section of the Overview tab, click the row checkbox for your target node in its datacenter.
-
Click the overflow menu icon (3 dots) on your target node.
-
Click Replace.
The replace activity starts immediately.
To view notifications from the replace operation, see Monitor replace activity status.
-
Modify the
replace-node-task.cassandratask.yaml
file to define areplacenode
CassandraTask
.Here is a sample:
apiVersion: control.k8ssandra.io/v1alpha1 kind: CassandraTask metadata: name: replace-node-POD_NAME namespace: DATABASE_ID spec: datacenter: name: DATACENTER_NAME namespace: DATABASE_ID jobs: - name: replace-node command: replacenode args: pod_name: CLUSTER_NAME-DC_NAME-sts-ORDINAL
Replace the following:
-
POD_NAME: Name of the pod
-
DATABASE_ID: Unique identifier for the database
-
DATACENTER_NAME: Name of the datacenter
-
CLUSTER_NAME: Name of the cluster
-
DC_NAME: Name of the datacenter
-
ORDINAL: The ordinal number of the pod in the stateful set
Key options:
-
metadata.name
: a unique identifier within the Kubernetes namespace where the task is submitted. While the name can be any value, consider including the cluster name to prevent collision with other options. -
spec.datacenter
: a uniquenamespace
andname
combination used to determine which datacenter to target with this operation. -
spec.jobs[0].command
: MUST bereplacenode
for this operation. -
Optional:
spec.jobs[0].args.keyspace_name
: restricts this operation to a particular keyspace. Omitting this value results in ALL keyspaces being replaced. By default all keyspaces are rebuilt.
-
-
Submit the
replacenode
CassandraTask
custom resource definition to theData Plane
Kubernetes cluster where the target datacenter and its node are deployed:kubectl apply -f replace-node-task.cassandratask.yaml
Mission Control detects and manages the modified
CassandraTask
custom resource definition (CRD). Mission Control stops the DSE node if it is running and then deletes the Persistent Volume(s) (PV). It then deletes the node (pod) where the DSE or Cassandra database is running. Mission Control deploys a new replacement node, starts it normally, and picks up the same token range as the deleted node.
Monitor replace activity status
Choose User Interface (UI) or Command Line Interface (CLI) steps.
-
UI
-
CLI
-
In the main navigation, click Activities.
-
See Status notifications regarding the progress of the replace activity.
A status of SUCCESS indicates the replace operation completed without issue. Timestamps are issued for the Start and End of the replace activity.
The Activities pane refreshes often and automatically.
-
Monitor the progress and view the status of the CassandraTask object by issuing this command in the
Control Plane
cluster:kubectl get cassandratask replace-node -o yaml
Sample results
... status: completionTime: "2022-11-01T03:28:33Z" conditions: - lastTransitionTime: "2022-11-01T03:28:12Z" status: "True" type: Running - lastTransitionTime: "2022-11-01T03:28:34Z" status: "False" type: Complete startTime: "2022-11-01T03:28:12Z" succeeded: 1
Mission Control sets the
startTime
field prior to starting thereplacenode
operation. It updates thecompletionTime
field when thereplacenode
operation is completed.The
status
field starts as"True"
and is set to"False"
when thereplacenode
operation completes. Thetype
field changes fromReplacingNodes
toComplete
when thereplacenode
operation completes.