Rebuild a datacenter’s replicas
A rebuild operation runs the nodetool rebuild command, rebuilding data on a single node by streaming from another (source) datacenter (DC).
Run this operation on each node after defining a source datacenter from which to stream data.
This command streams data from another datacenter to rebuild the local DC’s replicas.
Redistribute data on remaining nodes after one or more nodes or datacenters are terminated or added.
Performance impact
Rebuilding nodes ensures high availability of data and avoids single points of failure.
Example
A single datacenter (dc1) is deployed on the data plane Kubernetes cluster.
A control plane Kubernetes cluster exists.
Workflow of user and operators
-
Perform a backup of the node.
-
User defines the
rebuildCassandraTask. Specify the affected node as offline. -
User submits a
rebuildCassandraTaskto the data plane Kubernetes cluster where the datacenter is deployed. -
DC-operator detects new task custom resource definition (CRD).
-
DC-operator iterates one rack at a time.
-
DC-operator triggers and monitors rebuild operations one pod at a time.
-
DC-operator reports task progress and status.
-
User requests a status report of the
rebuildCassandraTaskwith thekubectlcommand, and views the status response.
Rebuild a datacenter’s replicas
-
Modify the
rebuild-dc1.cassandratask.yamlfile.Here is a sample:
apiVersion: control.k8ssandra.io/v1alpha1 kind: CassandraTask metadata: name: rebuild-dc1 spec: datacenter: name: dc1 namespace: demo jobs: - name: rebuild-dc1 command: rebuild args: keyspace_name: KEYSPACE_NAMEReplace
KEYSPACE_NAMEwith the name of the keyspace to rebuild.Key options:
-
metadata.name: a unique identifier within the Kubernetes namespace where the task is submitted. While the name can be any value, consider including the cluster name to prevent collision with other options. -
spec.datacenter: a uniquenamespaceandnamecombination used to determine which datacenter is the source for this operation. -
spec.jobs[0].command: MUST berebuildfor this operation. -
Optional:
spec.jobs[0].args.keyspace_name: restricts this operation to a particular keyspace. Omitting this value results in ALL keyspaces being rebuilt. By default all keyspaces are rebuilt.
-
-
Submit the
rebuildCassandraTaskCRD to data plane Kubernetes cluster where the datacenter is deployed with this command:kubectl apply -f rebuild-dc1.cassandratask.yamlSubmit the
rebuildCassandraTaskobject to the Kubernetes cluster where the specified datacenter is deployed.If
nodetool rebuildis interrupted before completion, restart it by re-entering the command. The process resumes from the point at which it was interrupted.
Configure parallel rebuilds
In Mission Control version 1.18.0 and later, you can configure the number of nodes to rebuild in parallel to speed up the rebuild process. By default, Mission Control rebuilds nodes sequentially, one node at a time.
Use parallel rebuilds to:
-
Restore data after adding a new datacenter.
-
Minimize the time required for data synchronization.
-
Reduce overall rebuild duration for large datasets.
|
Parallel rebuilds increase network traffic and disk I/O across the cluster. Monitor cluster performance and adjust the parallelism level based on your infrastructure capacity. |
Configure parallel rebuilds using two parameters:
-
jobsCount: Set this parameter at the cluster level (K8ssandraClusterorCassandraDatacenter) to control the number of compaction threads Cassandra uses for the rebuild operation. -
maxConcurrentPods: Set this parameter at the task level (CassandraTaskspec) to control how many nodes the operator processes simultaneously within a rack.
Configure cluster-level compaction threads
Configure the jobsCount parameter in your cluster’s K8ssandraCluster manifest:
apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
name: demo
spec:
cassandra:
datacenters:
- metadata:
name: dc1
size: 3
datacenters:
- metadata:
name: dc2
size: 3
jobsCount: 4
The jobsCount parameter controls the number of compaction threads Cassandra uses for the rebuild operation.
By default, Cassandra uses all available compaction threads.
Configure task-level node parallelism
Add the maxConcurrentPods parameter to your CassandraTask spec to control how many nodes the operator processes simultaneously within a rack:
apiVersion: control.k8ssandra.io/v1alpha1
kind: CassandraTask
metadata:
name: rebuild-dc1
spec:
datacenter:
name: dc1
namespace: demo
maxConcurrentPods: 3
jobs:
- name: rebuild-dc1
command: rebuild
args:
keyspace_name: KEYSPACE_NAME
The maxConcurrentPods parameter specifies the maximum number of nodes that the operator processes simultaneously within a rack.
Setting maxConcurrentPods: 3 allows up to three nodes to rebuild in parallel.
-
Monitor the rebuild operation progress:
kubectl get cassandratask rebuild-dc1 | yq .statusResult... status: completionTime: "2022-10-23T23:34:38Z" conditions: - lastTransitionTime: "2022-10-23T23:34:08Z" status: "True" type: Running - lastTransitionTime: "2022-10-23T23:34:39Z" status: "True" type: Complete startTime: "2022-10-23T23:34:08Z" succeeded: 3The datacenter-level operators set the
startTimefield prior to starting therebuildoperation. They update thecompletionTimefield when therebuildoperation is completed.The sample output indicates that the task is completed with the
type: Completestatus condition set toTrue. Thesucceeded: 3field indicates that three nodes completed the requested task successfully. Afailedfield tracks a running count of pods that failed therebuildoperation. -
Monitor the node logs on the data plane cluster to verify that the datacenter is rebuilt with this command:
kubectl logs -f demo-dc1-rack3-sts-1 -c server-system-logger | grep "finished rebuild"ResultINFO [pool-18-thread-1] 2022-10-23 23:32:18,088 StorageService.java:1895 - finished rebuild for (All keyspaces), (All tokens), 1 streaming connections, NORMAL, included DCs: dc1 after 5 seconds receiving 3.52 MiB.