Configure pod resources for Mission Control components
This guide explains how to configure resource requests and limits for Mission Control platform components to ensure optimal performance and stability in your deployment. It provides baseline recommendations and scaling considerations for Mission Control components.
Resource requirements for Mission Control components vary based on several factors, including the number of managed database clusters, cluster sizes, concurrent operations, and workload characteristics. Start with the baseline recommendations in this guide, monitor actual resource usage through the Mission Control dashboards, and adjust based on observed patterns. Plan for peak load scenarios and consider growth over time when configuring your resources.
|
The recommendations in this guide assume production workloads. You can reduce these values for development or testing environments. |
Control plane operators
The control plane operators manage the lifecycle of database clusters and their components. Each operator has specific resource requirements based on the number and complexity of managed resources.
mission-control-operator
The mission-control-operator manages custom resources and cluster operations.
Configure the operator with baseline resources for normal operations and allow burst capacity during cluster operations like upgrades and scaling.
| Resource | Value | Notes |
|---|---|---|
CPU request |
100m |
Baseline for normal operations |
CPU limit |
1000m |
Allows burst during cluster operations |
Memory request |
256Mi |
Baseline for normal operations |
Memory limit |
512Mi |
Prevents excessive memory usage |
Replicas |
1 |
Single instance sufficient for most deployments |
Increase resources when managing more than 10 database clusters, performing frequent cluster operations like upgrades or scaling, or experiencing reconciliation delays. For deployments with 20 or more clusters, increase the CPU limit to 2000m and memory limit to 1Gi.
cass-operator
The cass-operator manages the lifecycle of Apache Cassandra®, DataStax Enterprise (DSE), and Hyper-Converged Database (HCD) logical datacenters.
This operator requires more resources than other operators due to its direct management of database node lifecycles.
| Resource | Value | Notes |
|---|---|---|
CPU request |
200m |
Baseline for normal operations |
CPU limit |
1000m |
Allows burst during reconciliation |
Memory request |
64Mi |
Baseline for normal operations |
Memory limit |
256Mi |
Provides overhead beyond the 128Mi kustomize default |
Replicas |
1 |
Single instance sufficient for most deployments |
The operator detects new custom resources, generates StatefulSets, creates pods and storage resources, and performs bootstrap operations to reconcile database resources. The operator typically uses 200-500MB of memory in production environments. Increase resources when managing more than 5 database clusters, managing clusters with more than 10 nodes each, or performing frequent node operations like replacements or scaling. For deployments with 10 or more clusters, increase the CPU limit to 2000m and memory limit to 512Mi.
k8ssandra-operator
The k8ssandra-operator manages K8ssandra resources including cluster definitions, Reaper, and Medusa components.
| Resource | Value | Notes |
|---|---|---|
CPU request |
100m |
Baseline for normal operations |
CPU limit |
1000m |
Allows burst during cluster operations |
Memory request |
64Mi |
Baseline for normal operations |
Memory limit |
256Mi |
Provides overhead beyond the 128Mi kustomize default |
Replicas |
1 |
Single instance sufficient for most deployments |
The operator typically uses 200-500MB of memory in production environments. Increase resources when managing more than 10 database clusters, performing frequent backup and restore operations, or running multiple concurrent repair operations. For deployments with 20 or more clusters, increase the CPU limit to 2000m and memory limit to 512Mi.
kube-state-metrics
The kube-state-metrics component monitors the state of Kubernetes resources and makes them available for monitoring and alerting.
| Resource | Value | Notes |
|---|---|---|
CPU request |
10m |
Minimal baseline |
CPU limit |
200m |
Sufficient for most deployments |
Memory request |
64Mi |
Minimal baseline |
Memory limit |
128Mi |
Sufficient for most deployments |
Replicas |
1 |
Single instance sufficient |
Increase resources when managing very large clusters with more than 1000 pods or experiencing metrics collection delays. For very large clusters, increase the CPU limit to 500m and memory limit to 256Mi.
User interface
The UI component provides a web interface and REST API for Mission Control. Configure the UI with sufficient resources to handle concurrent user requests and API operations.
| Resource | Value | Notes |
|---|---|---|
CPU request |
100m |
Baseline for normal operations |
CPU limit |
1000m |
Allows burst during API operations |
Memory request |
256Mi |
Baseline for normal operations |
Memory limit |
512Mi |
Prevents excessive memory usage |
Replicas |
2 |
High availability recommended |
The UI service handles user authentication, cluster management, and monitoring through a web-based interface while exposing a REST API for programmatic access.
Scale UI resources based on your deployment’s usage patterns. Increase resources when supporting more than 50 concurrent users, experiencing slow API response times, or managing large numbers of clusters. For high user concurrency, increase the CPU limit to 2000m, memory limit to 1Gi, and consider deploying three or more replicas for high availability requirements.
Authentication services
The dex component provides authentication services for Mission Control, handling user authentication and authorization through various methods including static credentials, LDAP, OAuth2, and OpenID Connect.
| Resource | Value | Notes |
|---|---|---|
CPU request |
50m |
Minimal baseline |
CPU limit |
500m |
Sufficient for most deployments |
Memory request |
128Mi |
Minimal baseline |
Memory limit |
256Mi |
Sufficient for most deployments |
Replicas |
2 |
High availability recommended |
Increase resources when supporting more than 100 concurrent users, using external authentication providers like LDAP or OAuth, or experiencing authentication delays. For high user concurrency, increase the CPU limit to 1000m, memory limit to 512Mi, and consider deploying three or more replicas for very high availability requirements.
|
The KOTS Admin Console and Mission Control UI use separate authentication systems. The KOTS Admin Console (port 8800) manages the Mission Control installation, configuration, licensing, and upgrades, while the Mission Control UI (port 30880) provides the database management interface for cluster operations. You can change Mission Control UI passwords through KOTS. After you make changes, restart the Dex pod to apply them. |
Database management components
The following components handle database operations such as backup, restore, and cluster management tasks.
Medusa
Medusa provides backup and restore capabilities for database clusters. Medusa runs as a sidecar container in each database pod, so configure these resources per database node rather than per cluster.
| Resource | Value | Notes |
|---|---|---|
CPU request |
100m |
Baseline for normal operations |
CPU limit |
1000m |
Allows burst during backup operations |
Memory request |
256Mi |
Baseline for normal operations |
Memory limit |
512Mi |
Prevents excessive memory usage |
Increase resources when managing large databases with more than 1TB per node, performing frequent backups, or using slow object storage backends. For large databases, increase the CPU limit to 2000m and memory limit to 1Gi. Consider backup scheduling to avoid resource contention during peak usage periods.
For detailed information about Medusa configuration and usage, see Back up data.
Reaper
Reaper provides repair management for database clusters. Deploy one Reaper instance per database cluster to manage repair operations.
| Resource | Value | Notes |
|---|---|---|
CPU request |
200m |
Baseline for normal operations |
CPU limit |
1000m |
Allows burst during repair operations |
Memory request |
512Mi |
Baseline for normal operations |
Memory limit |
1Gi |
Prevents excessive memory usage |
Replicas |
1 |
One instance per database cluster |
Increase resources when managing large clusters with 10 or more nodes, running intensive repair schedules, or managing clusters with large data volumes. For large clusters, increase the CPU limit to 2000m and memory limit to 2Gi. Consider repair scheduling to avoid resource contention during peak usage periods.
For detailed information about Reaper configuration and usage, see Clean up data.
Stargate
Stargate provides API gateway functionality for database access, including REST, GraphQL, and Document APIs. Deploy Stargate when you need HTTP-based access to your database without configuring CQL drivers.
| Resource | Value | Notes |
|---|---|---|
CPU request |
500m |
Baseline for API operations |
CPU limit |
2000m |
Allows burst during high traffic |
Memory request |
1Gi |
Baseline for API operations |
Memory limit |
2Gi |
Prevents excessive memory usage |
Replicas |
2 |
High availability recommended |
Stargate resource requirements depend heavily on API traffic patterns and the number of concurrent connections. Increase resources when supporting more than 100 concurrent API connections, experiencing slow API response times, or handling large result sets. For high-traffic deployments, increase the CPU limit to 4000m, memory limit to 4Gi, and deploy three or more replicas.
Configure Stargate through the MissionControlCluster specification under the stargate section.
For more information about Stargate configuration, see the CRD reference documentation.
Sidecar containers
Sidecar containers run alongside database pods to provide additional functionality such as observability and monitoring.
Vector sidecar
Vector runs as a sidecar container in database pods to collect and forward logs and metrics. Configure Vector sidecar resources separately from the observability stack’s Vector aggregator.
| Resource | Value | Notes |
|---|---|---|
CPU request |
50m |
Minimal baseline for log collection |
CPU limit |
500m |
Allows burst during high log volume |
Memory request |
128Mi |
Minimal baseline for log buffering |
Memory limit |
256Mi |
Prevents excessive memory usage |
Increase resources when database pods generate high log volumes, experiencing log collection delays, or running complex log transformation pipelines. For high-volume logging, increase the CPU limit to 1000m and memory limit to 512Mi.
Configure Vector sidecar resources through the telemetry.vector.resources section in your MissionControlCluster specification.
cqlsh pod
The cqlsh pod provides an ephemeral CQL shell for interactive database access through the Mission Control UI.
These pods are created on-demand and terminated after use.
| Resource | Value | Notes |
|---|---|---|
CPU request |
100m |
Minimal baseline for shell operations |
CPU limit |
500m |
Sufficient for most queries |
Memory request |
256Mi |
Minimal baseline for query results |
Memory limit |
512Mi |
Prevents excessive memory usage |
Increase resources when running complex queries that return large result sets or experiencing slow query execution. For complex analytical queries, increase the CPU limit to 1000m and memory limit to 1Gi.
The cqlsh pod configuration is managed automatically by Mission Control and does not require manual resource configuration in most cases.
cert-manager
Cert-manager is an external dependency that Mission Control uses to manage TLS certificates for secure communication between components. While cert-manager is not part of Mission Control, you must install and configure it before deploying Mission Control.
| Resource | Value | Notes |
|---|---|---|
CPU request |
10m |
Minimal baseline |
CPU limit |
100m |
Sufficient for most deployments |
Memory request |
32Mi |
Minimal baseline |
Memory limit |
128Mi |
Sufficient for most deployments |
Replicas |
1 |
Single instance sufficient |
Cert-manager resource requirements are typically minimal as it primarily watches for certificate resources and renews them before expiration. Increase resources when managing more than 100 certificates or experiencing certificate renewal delays.
For information about cert-manager configuration and proper secret cleanup, see Configure cert-manager for proper secret cleanup.
Configure resources with Helm
Configure resource limits in your Helm values file to apply the recommendations in this guide. The following example shows how to configure resources for control plane operators, user interface components, authentication services, and metrics collection.
# Control plane operators
missionControlOperator:
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 1000m
memory: 512Mi
cassOperator:
resources:
requests:
cpu: 100m
memory: 64Mi
limits:
cpu: 500m
memory: 256Mi
k8ssandraOperator:
resources:
requests:
cpu: 100m
memory: 64Mi
limits:
cpu: 500m
memory: 256Mi
# User interface
ui:
replicas: 2
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 1000m
memory: 512Mi
# Authentication
dex:
replicas: 2
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
# Metrics
kubeStateMetrics:
resources:
requests:
cpu: 10m
memory: 64Mi
limits:
cpu: 200m
memory: 128Mi
Configure database cluster resources
Configure Medusa, Reaper, Vector sidecar, and optional Stargate resources in your cluster definition to apply the recommendations for database management components. The following example shows how to configure resources for a database cluster.
apiVersion: missioncontrol.datastax.com/v1beta2
kind: MissionControlCluster
metadata:
name: my-cluster
spec:
k8ssandra:
cassandra:
datacenters:
- metadata:
name: dc1
k8sContext: my-context
size: 3
# Medusa configuration (per node)
medusa:
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 1000m
memory: 512Mi
# Vector sidecar configuration (per node)
telemetry:
vector:
enabled: true
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
# Optional Stargate configuration
stargate:
size: 2
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi
# Reaper configuration (per cluster)
reaper:
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
Monitor and adjust resources
Monitor resource usage to validate and adjust your configuration based on actual workload patterns. Use the following commands to check current resource usage and identify resource constraints.
Check operator resource usage:
kubectl top pod -n mission-control -l app.kubernetes.io/name=mission-control
Check kube-state-metrics resource usage:
kubectl top pod -n mission-control -l app.kubernetes.io/name=kube-state-metrics
Check Medusa resource usage in the database namespace:
kubectl top pod -n DATABASE_NAMESPACE -l app.kubernetes.io/name=medusa
Check Reaper resource usage in the database namespace:
kubectl top pod -n DATABASE_NAMESPACE -l app.kubernetes.io/name=reaper
Check Stargate resource usage in the database namespace:
kubectl top pod -n DATABASE_NAMESPACE -l app.kubernetes.io/name=stargate
Replace DATABASE_NAMESPACE with the namespace of the database you want to inspect.
Check for pods hitting resource limits:
kubectl get events -n mission-control --field-selector reason=OOMKilled
Check for CPU throttling:
kubectl describe pod -n mission-control POD_NAME | grep -A 5 "Limits"
Replace POD_NAME with the name of the pod you want to inspect.
Adjust resources based on observed patterns. Increase CPU limits when you observe high CPU usage, increase memory limits when you observe memory pressure, increase both CPU and memory when you observe slow operations, and significantly increase memory limits when you observe OOMKilled events.
Best practices
Follow these best practices when configuring resources for Mission Control components:
-
Start with the recommended baseline values in this guide and adjust based on monitoring data.
-
Always configure both requests and limits to ensure predictable scheduling and resource allocation.
-
Monitor resource usage continuously through the Mission Control dashboards to identify trends and potential issues.
-
Plan for expected growth over the next 6-12 months when sizing resources.
-
Validate sizing with realistic workloads before deploying to production.
-
Document resource adjustments and their impact for future reference.
-
Review and reassess sizing quarterly or after significant changes to your deployment.
Troubleshoot resource issues
Use the following guidance to diagnose and resolve common resource-related problems in your Mission Control deployment.
Pods stuck in pending state
The scheduler cannot place pods in the Pending state when it cannot find suitable nodes with sufficient resources.
Check node resources to verify available capacity, verify that resource requests are not too high for available nodes, ensure sufficient nodes are available in the cluster, and check for resource quotas in the namespace that might prevent scheduling.
Check node resources:
kubectl describe node NODE_NAME
Replace NODE_NAME with the name of the node you’re investigating.
Performance issues
Components become slow or unresponsive when resource limits are too low or when experiencing resource contention. Check current resource usage to identify components approaching their limits, review pod logs for errors or warnings related to resource constraints, check for CPU throttling that might slow down operations, and increase resource limits if usage is consistently high.
Check current resource usage:
kubectl top pod -n mission-control
Review pod logs:
kubectl logs -n mission-control POD_NAME
Replace POD_NAME with the name of the pod you’re investigating.
Check for CPU throttling:
kubectl describe pod -n mission-control POD_NAME
Replace POD_NAME with the name of the pod you’re investigating.
Out of memory errors
Kubernetes terminates pods with OOMKilled status when they exceed their memory limits. Check memory usage patterns to identify components with high memory consumption, review pod events to confirm OOMKilled status, increase memory limits by 50-100%, and monitor for memory leaks if OOMKilled events persist after increasing limits.
Check memory usage:
kubectl top pod -n mission-control --containers
Review pod events:
kubectl get events -n mission-control --field-selector involvedObject.name=POD_NAME
Replace POD_NAME with the name of the pod you’re investigating.