Add nodes to a cluster
Adding nodes increases the capacity to service customer queries against the data.
|
This task information focuses on adding nodes to a single existing datacenter. To scale-up the number of datacenters, follow the add a datacenter task. |
Prerequisites
-
A prepared environment on either bare-metal/VM or an existing Kubernetes cluster.
Workflow of user and operators
-
User submits modified datacenter
sizeparameter inMissionControlClusterto the control plane Kubernetes cluster. -
Cluster-level operator detects dc-level change in the cluster object and modifies dc-level resources.
-
DC-level operator detects
sizechange in dc-level resource and provisions Kubernetes resources representing the new nodes. -
DC-level operator bootstraps database nodes on new pods.
When commissioning nodes, Mission Control:
-
targets the rack with the lowest number of active nodes.
-
uses a bootstrap (self-starting process) that adds nodes without external input.
-
commissions multiple nodes in a single rack only after adjusting other racks in the datacenter to reflect the desired node count.
-
identifies the number of nodes being added.
Limitations - You must increase the datacenter size by a multiple of the number of racks in the target datacenter. For example, with 3 racks you can scale up by 3, 6, or 9 nodes, and so on. Invalid size parameters are ignored.
-
Add nodes to a datacenter in a cluster
|
Ensure that all database pods can route to each other. This is a critical requirement for proper operation and data consistency. The requirement applies to:
To achieve this:
Failure to establish proper pod-to-pod routing results in:
|
You start with an existing Kubernetes cluster with one datacenter with three nodes distributed equally across three racks.
The goal is to modify the MissionControlCluster manifest (object) specification and submit that change with the kubectl command to to add one or more nodes to a datacenter in a Kubernetes cluster.
-
Here is a sample
MissionControlClustermanifest nameddemo.missioncontrolcluster.yamlthat was used to initially create the datacenter (dc1):apiVersion: missioncontrol.datastax.com/v1beta2 kind: MissionControlCluster metadata: name: demo spec: k8ssandra: cassandra: serverVersion: 6.8.26 serverType: dse storageConfig: cassandraDataVolumeClaimSpec: storageClassName: premium-rwo accessModes: - ReadWriteOnce resources: requests: storage: 5Gi datacenters: - metadata: name: dc1 k8sContext: east size: 3 racks: - name: rack1 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-c - name: rack2 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-b - name: rack3 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-d -
Modify the
datacenters.sizespecification from3- (1 node per rack) to6- (3 nodes per rack):apiVersion: missioncontrol.datastax.com/v1beta2 kind: MissionControlCluster metadata: name: demo spec: ... datacenters: - metadata: name: dc1 k8sContext: east size: 6 racks: ... -
Submit this change in the control plane cluster:
kubectl apply -f demo.cassandratask.yamlThree additional nodes (pods) deploy in parallel as the
MissionControlClusterobject increases in size from three to six nodes. Each node, however, starts serially as specified by the order of the rack definitions.At any given time the number of started nodes in a rack cannot be more or less than the number of started nodes in all other racks by more than one. By default, Mission Control configures the database pods so that Kubernetes is blocked from scheduling multiple pods on the same worker node. An attempt to increase the cluster size beyond the number of available worker nodes might result in the non-deployment of additional pods.
-
Monitor the status of the nodes being created:
kubectl get pods -l "cassandra.datastax.com/cluster"=demoResult
NAME READY STATUS RESTARTS AGE demo-dc1-rack1-sts-0 2/2 Running 0 67m demo-dc1-rack1-sts-1 1/2 Running 0 110s demo-dc1-rack2-sts-0 2/2 Running 0 67m demo-dc1-rack2-sts-1 1/2 Running 0 110s demo-dc1-rack3-sts-0 2/2 Running 0 67m demo-dc1-rack3-sts-1 1/2 Running 0 110sThe
-lflag adds a label selector to filter the results. Every database pod has thecassandra.datastax.com/clusterlabel. There are six pods but only the initial three are fully ready. This is expected as the results were captured in mid-operation. -
Monitor the status of the CassandraDatacenter with this command:
kubectl get cassandradatacenter dc1 -o yamlResult
status: cassandraOperatorProgress: Updating conditions: - lastTransitionTime: "2022-10-19T20:24:40Z" message: "" reason: "" status: "True" type: Healthy - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "False" type: Stopped - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "False" type: ReplacingNodes - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "False" type: Updating - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "False" type: RollingRestart - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "False" type: Resuming - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "False" type: ScalingDown - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "True" type: Valid - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "True" type: Initialized - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "True" type: Ready - lastTransitionTime: "2022-10-19T21:24:34Z" message: "" reason: "" status: "True" type: ScalingUp lastServerNodeStarted: "2022-10-19T21:28:51Z" nodeStatuses: demo-dc1-rack1-sts-0: hostID: 2025d318-3fcc-4753-990b-3f9c388ba18a demo-dc1-rack1-sts-1: hostID: 33a0fc01-5947-471f-97a2-61237767d583 demo-dc1-rack2-sts-0: hostID: 50748fb8-da1f-4add-b635-e80e282dc09b demo-dc1-rack2-sts-1: hostID: eb899ffd-0726-4fb4-bea7-c9d84d555339 demo-dc1-rack3-sts-0: hostID: db86cba7-b014-40a2-b3f2-6eea21919a25 observedGeneration: 1 quietPeriod: "2022-10-19T20:24:47Z" superUserUpserted: "2022-10-19T20:24:42Z" usersUpserted: "2022-10-19T20:24:42Z"The
ScalingUpcondition has status:Trueindicating that the scaling up operation is in progress. Mission Control updates it toFalsewhen the operation is complete. -
If the results show a pod with
Pendingstatus, issue this command to get more details about the pod:kubectl describe pod POD_NAMEReplace
POD_NAMEwith the name of the pod that is in thePendingstatus. -
The results might indicate a
FailedSchedulingevent. This might occur when there are not enough infrastructure resources available. -
Run the following command to check the status of the
CassandraDatacenterobject. In the output look for aScalingUpcondition with itsstatusset toTrue.kubectl get cassandradatacenter cluster-name-dc-name -o yamlResult
... status: cassandraOperatorProgress: Updating conditions: - lastTransitionTime: "2021-03-30T22:01:48Z" message: "" reason: "" status: "True" type: ScalingUp ...After the new nodes are deployed and running, Mission Control automatically runs
nodetool cleanuponly on the original nodes and not the new nodes. This removes keys and data that are no longer associated with those original nodes.
Upon completion of the cleanup operation, the ScalingUp condition status is set to False for each node.
Next steps
Run Cleanup operation to recover disk space from previously provisioned nodes.