Rebuild a Datacenter’s Replicas
DataStax Mission Control is currently in Public Preview. DataStax Mission Control is not intended for production use, has not been certified for production workloads, and might contain bugs and other functional issues. There is no guarantee that DataStax Mission Control will ever become generally available. DataStax Mission Control is provided on an “AS IS” basis, without warranty or indemnity of any kind.
If you are interested in trying out DataStax Mission Control please join the Public Preview.
A rebuild operation runs the
nodetool rebuild command, rebuilding data on a single node by streaming from another (source) datacenter (DC). Run this operation on each node after defining a source datacenter from which to stream data. This command streams data from another datacenter to rebuild the local DC’s replicas.
Redistribute data on remaining nodes after one or more nodes or datacenters are terminated or added.
Rebuilding nodes ensures high availability of data and avoids single points of failure.
contextpointing to a
Control PlaneKubernetes cluster.
A single datacenter (
dc1) is deployed on the
Data Plane Kubernetes cluster. A
Control Plane Kubernetes cluster exists.
Perform a backup of the node.
User defines the
CassandraTask. Specify the affected node as offline.
User submits a
Data PlaneKubernetes cluster where the datacenter is deployed.
DC-operator detects new task custom resource definition (CRD).
DC-operator iterates one rack at a time.
DC-operator triggers and monitors rebuild operations one pod at a time.
DC-operator reports task progress and status.
User requests a status report of the
kubectlcommand, and views the status response.
Here is a sample:
apiVersion: control.k8ssandra.io/v1alpha1 kind: CassandraTask metadata: name: rebuild-dc1 spec: datacenter: name: dc1 namespace: demo jobs: - name: rebuild-dc1 command: rebuild args: keyspace_name: my_keyspace
metadata.name: a unique identifer within the Kuberbetes namespace where the task is submitted. While the name can be any value, consider including the cluster name to prevent collision with other options.
spec.datacenter: a unique
namecombination used to determine which datacenter is the source for this operation.
spec.jobs.command: MUST be
rebuildfor this operation.
spec.jobs.args.keyspace_name: restricts this operation to a particular keyspace. Omitting this value results in ALL keyspaces being rebuilt. By default all keyspaces are rebuilt.
kubectl apply -f rebuild-dc1.cassandratask.yaml
CassandraTaskobject to the Kubernetes cluster where the specified datacenter is deployed.
nodetool rebuildis interrupted before completion, restart it by re-entering the command. The process resumes from the point at which it was interrupted.
The DC-level operators perform a rolling rebuild operation, one node at a time. The order is determined lexicographically (aka Dictionary order), starting with rack names and then continuing with node (pod) names.
kubectl get cassandratask rebuild-dc1 | yq .status
... status: completionTime: "2022-10-23T23:34:38Z" conditions: - lastTransitionTime: "2022-10-23T23:34:08Z" status: "True" type: Running - lastTransitionTime: "2022-10-23T23:34:39Z" status: "True" type: Complete startTime: "2022-10-23T23:34:08Z" succeeded: 3
The DC-level operators set the
startTimefield prior to starting the
rebuildoperation. They update the
completionTimefield when the
rebuildoperation is completed.
The sample output indicates that the task is completed with the
type: Completestatus condition set to
succeeded: 3field indicates that three (3) nodes (or pods) completed the requested task successfully. A
failedfield tracks a running count of pods that failed the
Monitor the DSE node logs on the
Data Planecluster to verify that the datacenter is rebuilt with this command:
kubectl logs -f demo-dc1-rack3-sts-1 -c server-system-logger | grep "finished rebuild"
INFO [pool-18-thread-1] 2022-10-23 23:32:18,088 StorageService.java:1895 - finished rebuild for (All keyspaces), (All tokens), 1 streaming connections, NORMAL, included DCs: dc1 after 5 seconds receiving 3.52 MiB.