Interact with local operators during a control plane outage
When the Mission Control control plane is unavailable, you can still interact with the data plane’s local operators to manage your database clusters.
The cass-operator
manages individual database datacenters in the data plane.
When the control plane is down, you can interact with this operator directly to troubleshoot and recover at the datacenter level.
Prerequisites
-
A backup of your current cluster configuration
-
Access to the data plane of the Mission Control cluster
-
Permissions to manage Kubernetes resources
-
kubectl
version 1.20 or later installed on your local machine
Verify control plane availability
Before proceeding, verify if the control plane is unavailable:
kubectl cluster-info
Sample result
Unable to connect to the server: dial tcp SERVER_IP:8443: i/o timeout
Manage custom resources in the data plane
Each data plane runs its own Kubernetes API server, independent of the Mission Control control plane. You can interact with this API directly using standard Kubernetes tools:
-
kubectl
configured with the data plane’skubeconfig
file -
Kubernetes API via REST calls
The Mission Control UI usually creates Custom Resources (CRs), and these resources reside in the data plane. When the control plane is unavailable, you must manage these resources directly. For more information on CRs, see the Mission Control Custom Resource Definition (CRD) reference.
Change the context to the data plane
If the control plane is unavailable, change the context to the data plane using the kubeconfig
file.
List all available contexts:
kubectl config get-contexts
Change the context to the data plane:
kubectl config use-context DATA_PLANE_CLUSTER_NAME
Replace DATA_PLANE_CLUSTER_NAME
with the name of the data plane cluster.
View details of a CR
Describe a specific CR to view its details:
kubectl get CUSTOM_RESOURCE_KIND RESOURCE_NAME -n NAMESPACE -o yaml
Replace the following:
-
CUSTOM_RESOURCE_KIND: The kind of custom resource
-
RESOURCE_NAME: The name of the resource
-
NAMESPACE: The namespace where the resource is deployed
Modify a CR
You can edit a CR to make changes:
kubectl edit CUSTOM_RESOURCE_KIND RESOURCE_NAME -n NAMESPACE
Replace the following:
-
CUSTOM_RESOURCE_KIND: The kind of custom resource
-
RESOURCE_NAME: The name of the resource
-
NAMESPACE: The namespace where the resource is deployed
Apply changes to a CR
To apply changes to a CR, use the kubectl apply
command:
kubectl apply -f CUSTOM_RESOURCE.yaml
Replace CUSTOM_RESOURCE.yaml
with the YAML file containing the changes.
Delete a CR
To delete a CR, use the kubectl delete
command:
kubectl delete CUSTOM_RESOURCE_KIND RESOURCE_NAME -n NAMESPACE
Replace the following:
-
CUSTOM_RESOURCE_KIND: The kind of custom resource
-
RESOURCE_NAME: The name of the resource
-
NAMESPACE: The namespace where the resource is deployed
Manage database datacenter resources
When the control plane is down, you can only work with the cass-operator
in the data plane, which manages individual datacenters.
You can’t access K8ssandraCluster
level functionality during control plane outages.
During a control plane outage:
|
The system ignores any changes to K8ssandraCluster
objects when the control plane is down.
Additionally, if your deployment uses a single Reaper installation managed by the control plane, you can’t access Reaper functionality during an outage.
Available cass-operator
resources
The cass-operator
manages the following resources at the datacenter level:
-
CassandraDatacenter
: Defines individual datacenters and their configurations, including size, rack definitions, and storage -
CassandraTask
: Defines maintenance tasks for database datacenters
Update CassandraDatacenter
resources
List all CassandraDatacenter
resources:
kubectl get cassandradatacenter -A
View details of a CassandraDatacenter
resource:
kubectl describe cassandradatacenter DATACENTER_NAME -n NAMESPACE
Replace the following:
-
DATACENTER_NAME: The name of the
CassandraDatacenter
-
NAMESPACE: The namespace where the
CassandraDatacenter
is deployed
When the control plane becomes available again, your direct changes to the
It’s important to document any manual changes made during the outage to ensure they are properly incorporated when the control plane is restored. |
Modify a CassandraDatacenter
resource:
kubectl edit cassandradatacenter DATACENTER_NAME -n NAMESPACE
Replace the following:
-
DATACENTER_NAME: The name of the
CassandraDatacenter
-
NAMESPACE: The namespace where the
CassandraDatacenter
is deployed
Trigger a rolling restart
To trigger a rolling restart, you must create a CassandraTask
resource with the restart
command.
You can restart the entire datacenter or add an argument to restart a specific rack.
apiVersion: control.k8ssandra.io/v1alpha1
kind: CassandraTask
metadata:
name: restart-task
spec:
datacenter:
name: DATACENTER_NAME
namespace: cass-operator
jobs:
- name: JOB_NAME
command: restart
Replace the following:
-
DATACENTER_NAME: The name of the
CassandraDatacenter
to restart -
JOB_NAME: The name of the job
apiVersion: control.k8ssandra.io/v1alpha1
kind: CassandraTask
metadata:
name: restart-task
spec:
datacenter:
name: DATACENTER_NAME
namespace: cass-operator
jobs:
- name: JOB_NAME
command: restart
args:
- rack: RACK_NAME
Replace the following:
-
DATACENTER_NAME: The name of the
CassandraDatacenter
to restart -
JOB_NAME: The name of the job
-
RACK_NAME: The name of the rack to restart
Apply the restart task:
kubectl apply -f RESTART_TASK_FILENAME.yaml
Replace RESTART_TASK_FILENAME.yaml
with the name of the restart task file.
For more information on CassandraDatacenter
resources, see the CassandraDatacenter
CRD reference in the K8ssandra documentation.
Create a CassandraTask
CassandraTask
resources define maintenance tasks for database clusters, such as rebuilds and restarts.
You can create these tasks directly.
Supported tasks include:
-
rebuild
: Rebuild a node -
cleanup
: Cleanup a node -
restart
: Restart a node -
replacenode
: Replace a node -
upgradesstables
: Upgrade SSTables -
scrub
: Scrub a node -
compaction
: Compact a node -
move
: Move a node -
flush
: Flush a node -
garbagecollect
: Garbage collect a node -
refresh
: Refresh a node
For example, to create a task to replace a node, you can use the following YAML:
apiVersion: control.k8ssandra.io/v1alpha1
kind: CassandraTask
metadata:
name: REPLACE_TASK_FILE_NAME
spec:
datacenter:
name: DATACENTER_NAME
namespace: cass-operator
jobs:
- name: JOB_NAME
command: replacenode
args:
pod_name: POD_NAME
Replace the following:
-
REPLACE_TASK_FILE_NAME: The name of the replace task file
-
DATACENTER_NAME: The name of the
CassandraDatacenter
-
JOB_NAME: The name of the job
-
POD_NAME: The name of the pod to replace
Apply the replace task:
kubectl apply -f REPLACE_TASK_FILENAME.yaml
Replace REPLACE_TASK_FILENAME.yaml
with the name of the replace task file.
For more information on CassandraTask
resources, see the CassandraTask
CRD reference in the K8ssandra documentation.
Best practices
Follow these best practices to ensure a smooth recovery process:
Before making changes
-
Document all manual changes
-
Create backups of critical resources
-
Test changes in a non-production environment if possible
During the outage
-
Make only necessary changes
-
Keep detailed logs of all modifications
-
Coordinate changes with team members
After control plane recovery
-
Verify all changes are properly synchronized
-
Update documentation
-
Review logs for any inconsistencies
Recovery procedures
After the control plane is restored, verify the following:
-
Control plane connectivity
-
Changes made during the outage
-
Synchronize configurations if needed
-
Test cluster functionality
-
Update documentation with any permanent changes