Control plane and data plane operators
This topic explains how operators interact with the control plane and data plane in Mission Control, including their roles, responsibilities, and operational boundaries.
Mission Control uses a multi-operator architecture, where different operators manage distinct aspects of the platform and database lifecycle. It’s helpful to understand the relationship between these operators and the control/data plane separation for effective platform management, troubleshooting, and architectural planning.
Architecture layers
Mission Control operates across two distinct architectural layers:
-
Control plane: This centralized management layer orchestrates database operations across all managed clusters. The control plane includes the Mission Control UI, API, and core operators that manage cluster lifecycle, configuration, and operations.
-
Data plane: This distributed execution layer runs database workloads. Each data plane consists of Kubernetes clusters that host database nodes, along with local operators that manage the database resources within those clusters.
Both the control plane and data plane include operators that provide automation capabilities. Operators listen for changes in custom resources and reconcile the desired state with the actual state of the system.
Operator types and responsibilities
Mission Control uses three primary operators, each with distinct responsibilities:
| Operator | Location | Primary responsibilities | Scope | Key characteristics |
|---|---|---|---|---|
|
Control plane |
|
Global across all managed clusters |
|
|
Data plane (one per managed Kubernetes cluster) |
|
Local to a single Kubernetes cluster |
|
|
Data plane (one per managed Kubernetes cluster) |
|
Local to a single Kubernetes cluster, with awareness of multi-datacenter topology |
|
Control plane and data plane interaction
The control plane and data plane use a request-response pattern for all operations, whether creating clusters, running backups, performing repairs, or scaling nodes.
For example, when you trigger a backup through the UI, the control plane creates a K8ssandraTask CR in the data plane, data plane operators execute the backup using Medusa, and the operators send status updates back through the CR to the control plane.
Cluster creation follows a similar but more complex flow because it creates the foundational K8ssandraCluster CR that defines the entire database topology.
Request flow
The request flow consists of the following steps:
-
Control plane: A user initiates an operation through the Mission Control UI or API.
-
Control plane: The
mission-control-operatorvalidates and plans the operation. -
Control plane: The control plane creates or updates CRs. The operator writes the desired state to data plane custom resources.
-
Data plane: Data plane operators,
k8ssandra-operatorandcass-operator, listen for CR updates from the control plane operator. -
Data plane: Local operators execute the operation.
-
Data plane: Data plane operators update CR status fields, propagating the status back to the control plane.
-
Control plane: The
mission-control-operatortracks operation status until the operation ends (success or failure).
Example cluster creation flow
The cluster creation flow consists of the following steps:
-
Control plane: A user initiates cluster creation through the UI.
-
Control plane: The
mission-control-operatorreceives the request. -
Control plane: The
mission-control-operatorcreates aK8ssandraClusterCR in the data plane. -
Data plane: The
k8ssandra-operatordetects the newK8ssandraCluster. -
Data plane: The
k8ssandra-operatorcreatesCassandraDatacenterCRs. -
Data plane: The
cass-operatordetects the newCassandraDatacenter. -
Data plane: The
cass-operatorcreatesStatefulSetsfor database racks. -
Kubernetes: Kubernetes creates pods and storage.
-
Data plane: The
cass-operatorbootstraps database nodes. -
Data plane: Status updates flow back through CRs to the control plane.
-
Control plane: The
mission-control-operatorupdates the cluster status as the operation progresses. -
Control plane: When the operation completes successfully, the UI displays the cluster as ready.
Operational boundaries
Understand the boundaries between operators to clarify responsibilities and troubleshooting paths.
- Control plane boundaries
-
The control plane manages cluster-level configuration and policies, multi-datacenter coordination, user authentication and authorization, observability aggregation including metrics, logs, and alerts, backup and restore scheduling and coordination, and cluster lifecycle orchestration.
The control plane does not manage individual pod creation or deletion, direct database node operations, local Kubernetes resource management, node-level health monitoring, or direct database configuration changes.
- Data plane boundaries
-
The data plane manages database pod lifecycle,
StatefulSetmanagement, node-level operations such as bootstrap, decommission, and replace, local resource allocation, database configuration application, node health monitoring, and local backup and restore execution.The data plane does not manage cross-cluster coordination, global policy enforcement, user authentication, centralized observability, or multi-datacenter orchestration.
Communication patterns
Mission Control operators communicate using a declarative model and custom resources as the API boundary between the control plane and data planes.
Mission Control uses a declarative model where the control plane declares desired state in custom resources, and data plane operators continuously reconcile actual state to match desired state. Status information flows back through custom resource status fields, and no direct RPC or API calls occur between control and data plane operators.
Custom resources serve as the API boundary between control and data planes:
apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
name: demo-cluster
spec:
# Control plane sets desired state
cassandra:
serverVersion: "4.0.7"
datacenters:
- metadata:
name: dc1
size: 3
status:
# Data plane reports actual state
datacenters:
- name: dc1
readyReplicas: 3
conditions:
- type: Ready
status: "True"
Multi-region considerations
In multi-region deployments, each region has its own data plane, but you can only deploy the control plane in a single region. Operators in each region operate independently, and cross-region coordination occurs through the control plane. Network partitions between regions do not affect local operations.
For more information, see Configure a multi-region Mission Control environment.
Failure scenarios and isolation
Understanding how failures affect different layers helps you plan for high availability and disaster recovery.
Control plane failure
During a control plane failure, you cannot initiate new operations, but existing clusters continue to run normally. Data plane operators continue reconciliation and database operations remain unaffected.
To recover, restore the control plane independently. No data loss occurs in database clusters, and operations resume when the control plane becomes available.
Data plane failure
During a data plane failure, only the specific data plane cluster experiences issues while other data planes continue operating normally. The control plane remains operational and you can initiate operations on healthy data planes.
To recover, restore the data plane independently. The control plane maintains desired state and operators reconcile to desired state after recovery.
Operator failure
During an operator failure, the specific operator instance is unavailable, but the cluster continues to operate using the remaining operators.
-
mission-control-operatorfailure: You cannot initiate new cluster operations, but existing clusters remain unaffected and data plane operators continue normal operation. -
cass-operator failure: You cannot perform node-level operations in the affected cluster through the API. You can interact with nodes usingkubectl execandnodetool. Existing nodes continue running and other clusters remain unaffected. -
k8ssandra-operatorfailure: You cannot perform datacenter-level operations in the affected cluster, but existing datacenters continue running and other clusters remain unaffected.
Troubleshoot operator issues
Use these troubleshooting techniques to diagnose and resolve issues across operator layers.
Identify the operator layer
When you troubleshoot issues, identify which operator layer is involved, and then investigate the logs and other details related to that operator and failure.
The following table is a starting point for investigating some common issues:
| Symptom | Likely operator | Investigation path |
|---|---|---|
Cannot create new clusters through UI |
|
Check control plane operator logs and CR status |
Cluster creation stuck |
|
Check the data plane |
Pods not starting |
|
Check CassandraDatacenter CR and |
Node operations failing |
|
Check node-level logs and operator reconciliation |
Backup or restore issues |
|
Check the K8ssandraCluster CR and Medusa logs |
Multi-DC coordination issues |
|
Check both control and data plane operator logs |
Check custom resources
Custom resources provide the most direct insight into operator state:
- Check the
MissionControlClusterin the control plane -
kubectl get missioncontrolcluster -n **NAMESPACE**Replace
NAMESPACEwith the namespace of your control plane.kubectl describe missioncontrolcluster CLUSTER_NAME -n NAMESPACEReplace the following:
-
CLUSTER_NAME: The name of your cluster -
NAMESPACE: The namespace of your control plane
-
- Check the
K8ssandraClusterin the data plane -
kubectl get k8ssandracluster -n NAMESPACEReplace
NAMESPACEwith the namespace of your data plane.kubectl describe k8ssandracluster CLUSTER_NAME -n NAMESPACEReplace the following:
-
CLUSTER_NAME: The name of your cluster -
NAMESPACE: The namespace of your data plane
-
- Check the
CassandraDatacenterin the data plane -
kubectl get cassandradatacenter -n NAMESPACEReplace
NAMESPACEwith the namespace of your data plane.kubectl describe cassandradatacenter DC_NAME -n NAMESPACEReplace the following:
-
DC_NAME: The name of your datacenter -
NAMESPACE: The namespace of your data plane
-
Check operator logs
Operator logs show reconciliation activity and errors:
- Control plane operator logs
-
kubectl logs -n NAMESPACE deployment/mission-control-operatorReplace
NAMESPACEwith the namespace of your control plane. - Data plane operator logs
-
kubectl logs -n NAMESPACE deployment/cass-operatorReplace
NAMESPACEwith the namespace of your data plane.kubectl logs -n NAMESPACE deployment/k8ssandra-operatorReplace
NAMESPACEwith the namespace of your data plane.
Best practices
Follow these best practices to ensure reliable operation of Mission Control across the control plane and data planes.
-
Separation of concerns: Use the control plane for orchestration and policy while letting data plane operators handle execution. Avoid direct manipulation of data plane resources from the control plane, and instead use custom resources as the interface between layers. This separation ensures clear boundaries and maintainable operations.
-
Monitoring and observability: Monitor operator health in both control and data planes, and track custom resource status conditions to understand system state. Set up alerts for operator failures to enable quick response to issues, and monitor reconciliation loops for stuck operations that may indicate problems requiring intervention.
-
Resource management: Ensure adequate resources for operators in both planes to maintain reliable operation. Scale operator replicas based on cluster count to handle increased load, and monitor operator memory and CPU usage to identify resource constraints. Plan for operator overhead in capacity planning to avoid resource exhaustion.
-
Upgrade coordination: Upgrade control plane operators first to ensure compatibility with existing data planes. Verify control plane stability before you upgrade data plane operators to minimize risk of cascading failures. Upgrade data plane operators one cluster at a time to limit blast radius, and test upgrades in non-production environments first to identify potential issues before production deployment.