Add nodes to a cluster
Adding nodes increases the capacity to service customer queries against the data.
This task information focuses on adding nodes to a single existing datacenter. To scale-up the number of datacenters, follow the add a DSE datacenter task. |
Prerequisites
-
A prepared environment on either bare-metal/VM or an existing Kubernetes cluster.
Workflow of user and operators
-
User submits modified datacenter
size
parameter inMissionControlCluster
to theControl Plane
Kubernetes cluster. -
Cluster-level operator detects dc-level change in the cluster object and modifies dc-level resources.
-
DC-level operator detects
size
change in dc-level resource and provisions Kubernetes resources representing the new nodes. -
DC-level operator bootstraps DSE nodes on new pods.
When commissioning nodes, Mission Control:
-
targets the rack with the lowest number of active nodes.
-
uses a bootstrap (self-starting process) that adds nodes without external input.
-
commissions multiple nodes in a single rack only after adjusting other racks in the datacenter to reflect the desired node count.
-
identifies the number of nodes being added.
Limitations - You must increase the datacenter size by a multiple of the number of racks in the target datacenter. For example, with 3 racks you may scale up by 3, 6, or 9 nodes, and so on. Invalid size parameters are ignored.
-
Add nodes to a datacenter in a cluster
You start with an existing Kubernetes cluster with one datacenter with three DSE nodes distributed equally across three racks.
The goal is to modify the MissionControlCluster
manifest (object) specification and submit that change with the kubectl
command to to add one or more nodes to a datacenter in a Kubernetes cluster.
-
Here is a sample
MissionControlCluster
manifest nameddemo.missioncontrolcluster.yaml
that was used to initially create the datacenter (dc1):apiVersion: missioncontrol.datastax.com/v1beta2 kind: MissionControlCluster metadata: name: demo spec: k8ssandra: cassandra: serverVersion: 6.8.26 serverType: dse storageConfig: cassandraDataVolumeClaimSpec: storageClassName: premium-rwo accessModes: - ReadWriteOnce resources: requests: storage: 5Gi datacenters: - metadata: name: dc1 k8sContext: east size: 3 racks: - name: rack1 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-c - name: rack2 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-b - name: rack3 nodeAffinityLabels: topology.kubernetes.io/zone: us-east1-d
-
Modify the
datacenters.size
specification from3
- (1 node per rack) to6
- (3 nodes per rack):apiVersion: missioncontrol.datastax.com/v1beta2 kind: MissionControlCluster metadata: name: demo spec: ... datacenters: - metadata: name: dc1 k8sContext: east size: 6 racks: ...
-
Submit this change in the
Control Plane
cluster:kubectl apply -f demo.cassandratask.yaml
Three additional nodes (pods) deploy in parallel as the
MissionControlCluster
object increases in size from three to six nodes. Each node, however, starts serially as specified by the order of the rack definitions.At any given time the number of started nodes in a rack cannot be more or less than the number of started nodes in all other racks by more than one.
By default, Mission Control configures the Cassandra pods so that Kubernetes is blocked from scheduling multiple DSE pods on the same worker node. An attempt to increase the cluster size beyond the number of available worker nodes may result in the non-deployment of additional pods.
-
Monitor the status of the nodes being created:
kubectl get pods -l "cassandra.datastax.com/cluster"=demo
Sample results
NAME READY STATUS RESTARTS AGE demo-dc1-rack1-sts-0 2/2 Running 0 67m demo-dc1-rack1-sts-1 1/2 Running 0 110s demo-dc1-rack2-sts-0 2/2 Running 0 67m demo-dc1-rack2-sts-1 1/2 Running 0 110s demo-dc1-rack3-sts-0 2/2 Running 0 67m demo-dc1-rack3-sts-1 1/2 Running 0 110s
The
-l
flag adds a label selector to filter the results. Every DSE pod has thecassandra.datastax.com/cluster
label. There are six pods but only the initial three are fully ready. This is expected as the results were captured in mid-operation. -
Monitor the status of the CassandraDatacenter with this command:
kubectl get cassandradatacenter dc1 -o yaml
Sample results
status: cassandraOperatorProgress: Updating conditions: - lastTransitionTime: "2022-10-19T20:24:40Z" message: "" reason: "" status: "True" type: Healthy - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "False" type: Stopped - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "False" type: ReplacingNodes - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "False" type: Updating - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "False" type: RollingRestart - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "False" type: Resuming - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "False" type: ScalingDown - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "True" type: Valid - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "True" type: Initialized - lastTransitionTime: "2022-10-19T20:24:41Z" message: "" reason: "" status: "True" type: Ready - lastTransitionTime: "2022-10-19T21:24:34Z" message: "" reason: "" status: "True" type: ScalingUp lastServerNodeStarted: "2022-10-19T21:28:51Z" nodeStatuses: demo-dc1-rack1-sts-0: hostID: 2025d318-3fcc-4753-990b-3f9c388ba18a demo-dc1-rack1-sts-1: hostID: 33a0fc01-5947-471f-97a2-61237767d583 demo-dc1-rack2-sts-0: hostID: 50748fb8-da1f-4add-b635-e80e282dc09b demo-dc1-rack2-sts-1: hostID: eb899ffd-0726-4fb4-bea7-c9d84d555339 demo-dc1-rack3-sts-0: hostID: db86cba7-b014-40a2-b3f2-6eea21919a25 observedGeneration: 1 quietPeriod: "2022-10-19T20:24:47Z" superUserUpserted: "2022-10-19T20:24:42Z" usersUpserted: "2022-10-19T20:24:42Z"
The
ScalingUp
condition has status:True
indicating that the scaling up operation is in progress. Mission Control updates it toFalse
when the operation is complete. -
If the results show a pod with
Pending
status, issue this command to get more details about the pod:kubectl describe pod POD_NAME
Replace
POD_NAME
with the name of the pod that is in thePending
status. -
The results may indicate a
FailedScheduling
event. This might occur when there are not enough infrastructure resources available. -
Run the following command to check the status of the
CassandraDatacenter
object. In the output look for aScalingUp
condition with itsstatus
set toTrue
.kubectl get cassandradatacenter cluster-name-dc-name -o yaml
Sample results
... status: cassandraOperatorProgress: Updating conditions: - lastTransitionTime: "2021-03-30T22:01:48Z" message: "" reason: "" status: "True" type: ScalingUp ...
After the new nodes are deployed and running, Mission Control automatically runs
nodetool cleanup
only on the original nodes and not the new nodes. This removes keys and data that are no longer associated with those original nodes.
Upon completion of the cleanup operation, the ScalingUp
condition status is set to False
for each node.
Next steps
Run Cleanup operation to recover disk space from previously provisioned nodes.