Create a Multi-token DSE Cluster
Availability requirements might dictate the need to create a cluster with a low number of virtual nodes (vnodes) per node and a token per node. Use Mission Control to assist in assigning tokens as you create a cluster and to help assure a more balanced cluster. A large number of tokens spreads the effects of a single node going down across multiple nodes in the cluster. However, this also leads to increased operational overhead with repairs, analytics workloads, and search. Setting the appropriate number of tokens during cluster creation is imperative. Changing this value is difficult after a cluster is created.
Create a cluster with multi-tokens
This example steps you through creating a cluster with four vnodes and four (4) tokens where the number of vnodes per DSE host is less than 16.
This example works with a two datacenter cluster with four vnodes per DSE node, 3 racks per datacenter, and 18 nodes total.
Modify the MissionControlCluster
manifest to explicitly override the default of 16
in the config:cassandraYaml:num_tokens
section.
From this definition, Mission Control automatically generates initial tokens.
-
Required: Modify the
MissionControlCluster
manifest, explicitly specifyingconfig.cassandraYaml.num_tokens: 4
, as follows:apiVersion: missioncontrol.datastax.com/v1beta2 kind: MissionControlCluster metadata: name: demo spec: k8ssandra: cassandra: serverVersion: 6.8.25 serverType: dse config: cassandraYaml: num_tokens: 4 datacenters: - metadata: name: dc1 k8sContext: data-plane-1 size: 9 racks: - name: rack1 - name: rack2 - name: rack3 - metadata: name: dc2 k8sContext: data-plane-2 size: 9 racks: - name: rack1 - name: rack2 - name: rack3
With single-token or few-token clusters, always try to size the datacenter with a number that is an exact multiple of the number of racks. This facilitates the token allocation and results in a better token balance.
When using multiple racks, each rack is expected to replicate 100% of the data. Therefore, the number of racks must be equal to or greater than the replication factor (RF) in the datacenter, (RF=
3
by default). Failure to comply with this requirement prevents some hosts from deploying. -
Issue the following command from a pod running DSE in the cluster and review the resulting cluster:
kubectl exec demo-dc1-rack1-sts-0 -c cassandra -- nodetool -u demo-superuser -pw PASSWORD ring
Replace
PASSWORD
with the password for the superuser.Sample results
Datacenter: dc1 ========== Address Rack Status State Load Owns Token 8710962479251732601 10.100.7.11 rack1 Up Normal 195.28 KiB 37.04% -9223372036854775808 10.100.10.16 rack3 Up Normal 224.16 KiB 33.33% -8710962479251732915 10.100.16.17 rack2 Up Normal 235.68 KiB 29.63% -8540159293384051146 [...] 10.100.4.11 rack2 Up Normal 227.99 KiB 33.33% 8198552921648689399 10.100.6.11 rack1 Up Normal 237.61 KiB 29.63% 8369356107516371168 10.100.0.12 rack3 Up Normal 232.3 KiB 29.63% 8710962479251732601 Datacenter: dc2 ========== Address Rack Status State Load Owns Token 9144875253562394737 10.100.15.9 rack2 Up Normal 208.28 KiB 33.33% -8789459262544113986 10.100.14.7 rack1 Up Normal 203.83 KiB 29.63% -8618656076676432217 10.100.9.6 rack3 Up Normal 209.17 KiB 29.63% -8277049704941070781 [...] 10.100.9.6 rack3 Up Normal 209.17 KiB 29.63% 8290859324223990098 10.100.17.7 rack2 Up Normal 201.02 KiB 29.63% 8632465695959351530 10.100.11.12 rack3 Up Normal 206.97 KiB 37.04% 9144875253562394737
When using racks and vnodes, the Owns column is understood as follows:
Although each line corresponds to a vnode, this column reports the effective data ownership for the entire host (and not just the vnode), within its rack (and not within the datacenter). For example, in
dc1
,rack1
, 100% of the data is replicated; in this rack, host10.100.7.11
contains37.04%
of that total data.
In this example of three (3) physical hosts per rack, in order to get a perfectly balanced cluster, where each rack is also well balanced, the expected ownership for any physical host would be 33% of the data within the rack.
Instead, note that the ownership distribution varies a little.
This is because in complex cases such as this, the token allocation algorithm may not always achieve the ideal balance, but assigns an acceptable ownership variance.
In this case the data ownership variance for a host is 29.63%
minimum and 37.04%
maximum.