DataStax Enterprise Node Cleanup
DataStax Mission Control is currently in Public Preview. DataStax Mission Control is not intended for production use, has not been certified for production workloads, and might contain bugs and other functional issues. There is no guarantee that DataStax Mission Control will ever become generally available. DataStax Mission Control is provided on an “AS IS” basis, without warranty or indemnity of any kind.
If you are interested in trying out DataStax Mission Control please join the Public Preview.
cleanup operation runs
nodetool cleanup for either all or specific keyspaces on all nodes in the specified datacenter. Create the
CassandraTask that defines a
cleanup operation in the same Kubernetes cluster where the target CassandraDatacenter is deployed.
DataStax Enterprise does not automatically remove data from nodes that lose part of their partition range to a newly added node. After adding a node, use
nodetool cleanup on the source node and on neighboring nodes that shared the same subrange to prevent the database from including the old data in order to rebalance the load on that node.
nodetool cleanup temporarily increases disk space use proportional to the size of the largest SSTable and triggers Disk I/O.
Failure to run
This operation forces all SSTables to compact on a node evicting data that is no longer replicated to this node. As with all compactions this leads to an increase in disk operations and potential for latency. Depending on the amount of data present on the node and the query workload you may want to schedule this cleanup operation during off-peak hours.
contextpointing to a
Control PlaneKubernetes cluster.
An existing Kubernetes cluster with one datacenter has 9 nodes (pods) distributed across 3 racks.
User defines a
DC-operator detects the new task custom resource definition (CRD).
DC-operator iterates one rack at a time.
DC-operator triggers and monitors cleanup operations one pod at a time.
DC-operator reports task progress and status.
User requests a status report of the cleanup
kubectlcommand, and views the status response.
Here is a sample:
apiVersion: control.k8ssandra.io/v1alpha1 kind: CassandraTask metadata: name: cleanup-dc1 spec: datacenter: name: dc1 namespace: demo jobs: - name: cleanup-dc1 command: cleanup args: keyspace_name: my_keyspace
metadata.name: a unique identifer within the Kubernetes namespace where the task is submitted. While the name can be any value, consider including the cluster name to prevent collision with other options.
spec.datacenter: a unique
namecombination used to determine which datacenter to target with this operation.
spec.jobs.command: MUST be
cleanupfor this operation.
spec.jobs.args.keyspace_name: restricts this operation to a particular keyspace. Omitting this value results in ALL keyspaces being cleaned up. By default all keyspaces are rebuilt.
jobsparameter is an array only one entry is permitted at this time. Specifying more than one job results in the task automatically failing.
kubectl apply -f cleanup-dc1.cassandratask.yaml
CassandraTaskobject to the Kubernetes cluster where the specified datacenter is deployed.
The DC-level operators perform a rolling cleanup operation, one node at a time. The order is determined lexicographically (aka Dictionary order), starting with rack names and then continuing with node (pod) names.
If a node is in process of being terminated and recreated, for whatever reason, as the cleanup operation is begun, the operation fails. In such an event, the DC-level operators retry the cleanup operation.
kubectl get cassandratask cleanup-dc1 | yq .status
... status: completionTime: "2022-10-13T21:06:55Z" conditions: - lastTransitionTime: "2022-10-13T21:05:23Z" status: "True" type: Running - lastTransitionTime: "2022-10-13T21:06:55Z" status: "True" type: Complete startTime: "2022-10-13T21:05:23Z" succeeded: 9
The DC-level operators set the
startTimefield prior to starting the
cleanupoperation. They update the
completionTimefield when the
cleanupoperation is completed.
The sample output indicates that the task is completed with the
type: Completestatus condition set to
succeeded: 9field indicates that nine (9) nodes (or pods) completed the requested task successfully. A
failedfield tracks a running count of pods that failed the