Interact with local operators during a control plane outage

When the Mission Control control plane is unavailable, you can still interact with the data plane’s local operators to manage your database clusters.

The cass-operator manages individual database datacenters in the data plane. When the control plane is down, you can interact with this operator directly to troubleshoot and recover at the datacenter level.

Prerequisites

A backup of your current cluster configuration
Access to the data plane of the Mission Control cluster
Permissions to manage Kubernetes resources
kubectl version 1.20 or later installed on your local machine

Verify control plane availability

Before proceeding, verify if the control plane is unavailable:

kubectl cluster-info

Sample result

Unable to connect to the server: dial tcp SERVER_IP:8443: i/o timeout

Manage custom resources in the data plane

Each data plane runs its own Kubernetes API server, independent of the Mission Control control plane. You can interact with this API directly using standard Kubernetes tools:

kubectl configured with the data plane’s kubeconfig file
Kubernetes API via REST calls

The Mission Control UI usually creates Custom Resources (CRs), and these resources reside in the data plane. When the control plane is unavailable, you must manage these resources directly. For more information on CRs, see the Mission Control Custom Resource Definition (CRD) reference.

Change the context to the data plane

If the control plane is unavailable, change the context to the data plane using the kubeconfig file.

List all available contexts:

kubectl config get-contexts

Change the context to the data plane:

kubectl config use-context DATA_PLANE_CLUSTER_NAME

Replace DATA_PLANE_CLUSTER_NAME with the name of the data plane cluster.

View details of a CR

Describe a specific CR to view its details:

kubectl get CUSTOM_RESOURCE_KIND RESOURCE_NAME -n NAMESPACE -o yaml

Replace the following:

CUSTOM_RESOURCE_KIND: The kind of custom resource
RESOURCE_NAME: The name of the resource
NAMESPACE: The namespace where the resource is deployed

Modify a CR

You can edit a CR to make changes:

kubectl edit CUSTOM_RESOURCE_KIND RESOURCE_NAME -n NAMESPACE

Replace the following:

CUSTOM_RESOURCE_KIND: The kind of custom resource
RESOURCE_NAME: The name of the resource
NAMESPACE: The namespace where the resource is deployed

Apply changes to a CR

To apply changes to a CR, use the kubectl apply command:

kubectl apply -f CUSTOM_RESOURCE.yaml

Replace CUSTOM_RESOURCE.yaml with the YAML file containing the changes.

Delete a CR

To delete a CR, use the kubectl delete command:

kubectl delete CUSTOM_RESOURCE_KIND RESOURCE_NAME -n NAMESPACE

Replace the following:

CUSTOM_RESOURCE_KIND: The kind of custom resource
RESOURCE_NAME: The name of the resource
NAMESPACE: The namespace where the resource is deployed

Manage database datacenter resources

When the control plane is down, you can only work with the cass-operator in the data plane, which manages individual datacenters. You cannot access K8ssandraCluster level functionality during control plane outages.

During a control plane outage:

Multi-datacenter operations are not available
Automated backup schedules may be affected
Monitoring and alerting capabilities may be limited
Changes may need manual reconciliation after recovery

The system ignores any changes to K8ssandraCluster objects when the control plane is down. Additionally, if your deployment uses a single Reaper installation managed by the control plane, you cannot access Reaper functionality during an outage.

Available `cass-operator` resources

The cass-operator manages the following resources at the datacenter level:

CassandraDatacenter: Defines individual datacenters and their configurations, including size, rack definitions, and storage
CassandraTask: Defines maintenance tasks for database datacenters

Update `CassandraDatacenter` resources

List all CassandraDatacenter resources:

kubectl get cassandradatacenter -A

View details of a CassandraDatacenter resource:

kubectl describe cassandradatacenter DATACENTER_NAME -n NAMESPACE

Replace the following:

DATACENTER_NAME: The name of the CassandraDatacenter
NAMESPACE: The namespace where the CassandraDatacenter is deployed

When the control plane becomes available again, your direct changes to the CassandraDatacenter resource will remain active until either:

A new version is created on the control plane, or
An annotation is placed on the MissionControlCluster object that explicitly allows overwriting the local changes. The cassandra.datastax.com/autoupdate-spec annotation controls this behavior. Use either always or once as values.

It’s important to document any manual changes made during the outage to ensure they are properly incorporated when the control plane is restored.

Modify a CassandraDatacenter resource:

kubectl edit cassandradatacenter DATACENTER_NAME -n NAMESPACE

Replace the following:

DATACENTER_NAME: The name of the CassandraDatacenter
NAMESPACE: The namespace where the CassandraDatacenter is deployed

Trigger a rolling restart

To trigger a rolling restart, you must create a CassandraTask resource with the restart command. You can restart the entire datacenter or add an argument to restart a specific rack.

Example 1: Restart a datacenter

apiVersion: control.k8ssandra.io/v1alpha1
kind: CassandraTask
metadata:
  name: restart-task
spec:
  datacenter:
    name: DATACENTER_NAME
    namespace: cass-operator
  jobs:
    - name: JOB_NAME
      command: restart

Replace the following:

DATACENTER_NAME: The name of the CassandraDatacenter to restart
JOB_NAME: The name of the job

Example 2: Restart a specific rack

apiVersion: control.k8ssandra.io/v1alpha1
kind: CassandraTask
metadata:
  name: restart-task
spec:
  datacenter:
    name: DATACENTER_NAME
    namespace: cass-operator
  jobs:
    - name: JOB_NAME
      command: restart
      args:
        - rack: RACK_NAME

Replace the following:

DATACENTER_NAME: The name of the CassandraDatacenter to restart
JOB_NAME: The name of the job
RACK_NAME: The name of the rack to restart

Apply the restart task:

kubectl apply -f RESTART_TASK_FILENAME.yaml

Replace RESTART_TASK_FILENAME.yaml with the name of the restart task file.

For more information on CassandraDatacenter resources, see the CassandraDatacenter CRD reference in the K8ssandra documentation.

Create a `CassandraTask`

CassandraTask resources define maintenance tasks for database clusters, such as rebuilds and restarts. You can create these tasks directly. Supported tasks include:

rebuild: Rebuild a node
cleanup: Cleanup a node
restart: Restart a node
replacenode: Replace a node
upgradesstables: Upgrade SSTables
scrub: Scrub a node
compaction: Compact a node
move: Move a node
flush: Flush a node
garbagecollect: Garbage collect a node
refresh: Refresh a node

For example, to create a task to replace a node, you can use the following YAML:

apiVersion: control.k8ssandra.io/v1alpha1
kind: CassandraTask
metadata:
  name: REPLACE_TASK_FILE_NAME
spec:
  datacenter:
    name: DATACENTER_NAME
    namespace: cass-operator
  jobs:
    - name: JOB_NAME
      command: replacenode
      args:
        pod_name: POD_NAME

Replace the following:

REPLACE_TASK_FILE_NAME: The name of the replace task file
DATACENTER_NAME: The name of the CassandraDatacenter
JOB_NAME: The name of the job
POD_NAME: The name of the pod to replace

Apply the replace task:

kubectl apply -f REPLACE_TASK_FILENAME.yaml

Replace REPLACE_TASK_FILENAME.yaml with the name of the replace task file.

For more information on CassandraTask resources, see the CassandraTask CRD reference in the K8ssandra documentation.

Best practices

Follow these best practices to ensure a smooth recovery process:

Before making changes

Document all manual changes
Create backups of critical resources
Test changes in a non-production environment if possible

During the outage

Make only necessary changes
Keep detailed logs of all modifications
Coordinate changes with team members

After control plane recovery

Verify all changes are properly synchronized
Update documentation
Review logs for any inconsistencies

Recovery procedures

After the control plane is restored, verify the following:

Control plane connectivity
Changes made during the outage
Synchronize configurations if needed
Test cluster functionality
Update documentation with any permanent changes

Interact with local operators during a control plane outage

Prerequisites

Verify control plane availability

Manage custom resources in the data plane

Change the context to the data plane

View details of a CR

Modify a CR

Apply changes to a CR

Delete a CR

Manage database datacenter resources

Available `cass-operator` resources

Update `CassandraDatacenter` resources

Trigger a rolling restart

Create a `CassandraTask`

Best practices

Before making changes

During the outage

After control plane recovery

Recovery procedures

Was this helpful?

Give Feedback