• Glossary
  • Support
  • Downloads
  • DataStax Home
Get Live Help
Expand All
Collapse All

DataStax Project Mission Control

    • Overview
      • Release notes
      • FAQs
      • Getting support
    • Installing DataStax Mission Control
      • Planning your install
      • Server-based Runtime Installer
        • Services setup with DataStax Mission Control Runtime Installer
      • Bring your own Kubernetes
        • Installing Control Plane
        • Installing Data Plane
    • Migrating
      • Migrating DSE Cluster to DataStax Mission Control
    • Managing
      • Managing DSE clusters
        • Configuring DSE
          • Authentication
          • Authorization
          • Securing DSE
          • DSE Unified Authorization
        • Cluster lifecycle
          • Creating a cluster
          • Creating a single-token cluster
          • Creating a multi-token cluster
          • Terminating a DSE cluster
          • Upgrading a DSE cluster
        • Datacenter lifecycle
          • Adding a DSE datacenter
          • Terminating a DSE datacenter
        • Node lifecycle
          • Adding DSE nodes
          • Terminating DSE nodes
          • Using per-node configurations
      • Managing DataStax Mission Control infrastructure
        • Adding a node to DataStax Mission Control clusters
        • Terminating a node from DataStax Mission Control clusters
        • Storage classes defined
      • Managing DataStax Mission Control resources
        • Accessing Admin Console
        • Configuring DataStax Mission Control
        • Generating a support bundle
    • Operating on DSE Clusters
      • Cleanup
      • Rebuilding
      • Replacing a node
      • Rolling restart
      • Upgrading SSTables
    • Reference
      • DSECluster manifest
      • CassandraTask manifest
  • DataStax Project Mission Control
  • Operating on DSE Clusters
  • Cleanup

DataStax Enterprise Node Cleanup

DataStax Mission Control is current in Private Preview. It is subject to the beta agreement executed between you and DataStax. DataStax Mission Control is not intended for production use, has not been certified for production workloads, and might contain bugs and other functional issues. There is no guarantee that DataStax Mission Control will ever become generally available. DataStax Mission Control is provided on an “AS IS” basis, without warranty or indemnity of any kind.

If you are interested in trying out DataStax Mission Control please contact your DataStax account team.

The cleanup operation runs nodetool cleanup for either all or specific keyspaces on all nodes in the specified datacenter. Create the CassandraTask that defines a cleanup operation in the same Kubernetes cluster where the target CassandraDatacenter is deployed.

DataStax Enterprise does not automatically remove data from nodes that lose part of their partition range to a newly added node. After adding a node, use nodetool cleanup on the source node and on neighboring nodes that shared the same subrange to prevent the database from including the old data in order to rebalance the load on that node. nodetool cleanup temporarily increases disk space use proportional to the size of the largest SSTable and triggers Disk I/O.

Failure to run nodetool cleanup after adding a node may result in data inconsistencies, including resurrection of previously deleted data.

Performance Impact

This operation forces all SSTables to compact on a node evicting data that is no longer replicated to this node. As with all compactions this leads to an increase in disk operations and potential for latency. Depending on the amount of data present on the node and the query workload you may want to schedule this cleanup operation during off-peak hours.

Prerequisites

  • The kubectl CLI tool.

  • Kubeconfig file or context pointing to a Control Plane Kubernetes cluster.

Example

An existing Kubernetes cluster with one datacenter has 9 nodes (pods) distributed across 3 racks.

Workflow of user and operators

  1. User defines a cleanup CassandraTask.

  2. User submits a cleanup CassandraTask with kubectl to the Data Plane Kubernetes cluster where the datacenter is deployed.

  3. DC-operator detects the new task custom resource definition (CRD).

  4. DC-operator iterates one rack at a time.

  5. DC-operator triggers and monitors cleanup operations one pod at a time.

  6. DC-operator reports task progress and status.

  7. User requests a status report of the cleanup CassandraTask with the kubectl command, and views the status response.

Procedure

  1. Modify the cleanup-dc1.cassandratask.yaml file.

    Here is a sample:

    apiVersion: control.k8ssandra.io/v1alpha1
    kind: CassandraTask
    metadata:
      name: cleanup-dc1
    spec:
      datacenter:
        name: dc1
        namespace: demo
      jobs:
        - name: cleanup-dc1
          command: cleanup
          args:
            keyspace_name: my_keyspace

    Key options:

    • metadata.name: a unique identifer within the Kubernetes namespace where the task is submitted. While the name can be any value, consider including the cluster name to prevent collision with other options.

    • spec.datacenter: a unique namespace and name combination used to determine which datacenter to target with this operation.

    • spec.jobs[0].command: MUST be cleanup for this operation.

    • Optional: spec.jobs[0].args.keyspace_name: restricts this operation to a particular keyspace. Omitting this value results in ALL keyspaces being cleaned up. By default all keyspaces are rebuilt.

      Although the jobs parameter is an array only one entry is permitted at this time. Specifying more than one job results in the task automatically failing.

  2. Submit the cleanup CassandraTask custom resource definition with the kubectl command:

    kubectl apply -f cleanup-dc1.cassandratask.yaml

    Submit the cleanup CassandraTask object to the Kubernetes cluster where the specified datacenter is deployed.

    The DC-level operators perform a rolling cleanup operation, one node at a time. The order is determined lexicographically (aka Dictionary order), starting with rack names and then continuing with node (pod) names.

    If a node is in process of being terminated and recreated, for whatever reason, as the cleanup operation is begun, the operation fails. In such an event, the DC-level operators retry the cleanup operation.

  3. Monitor the cleanup operation progress with this kubectl command:

    kubectl get cassandratask cleanup-dc1 | yq .status

    Sample output:

    ...
    status:
      completionTime: "2022-10-13T21:06:55Z"
      conditions:
      - lastTransitionTime: "2022-10-13T21:05:23Z"
        status: "True"
        type: Running
      - lastTransitionTime: "2022-10-13T21:06:55Z"
        status: "True"
        type: Complete
      startTime: "2022-10-13T21:05:23Z"
      succeeded: 9

    The DC-level operators set the startTime field prior to starting the cleanup operation. They update the completionTime field when the cleanup operation is completed.

    The sample output indicates that the task is completed with the type: Complete status condition set to True. The succeeded: 9 field indicates that nine (9) nodes (or pods) completed the requested task successfully. A failed field tracks a running count of pods that failed the cleanup operation.

Operating on DSE Clusters Rebuilding

General Inquiries: +1 (650) 389-6000 info@datastax.com

© DataStax | Privacy policy | Terms of use

DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Kubernetes is the registered trademark of the Linux Foundation.

landing_page landingpage