Create a multi-token cluster

Availability requirements might dictate the need to create a cluster with a low number of virtual nodes (vnodes) per node and a token per node. Use Mission Control to assist in assigning tokens as you create a cluster and to help assure a more balanced cluster. A large number of tokens spreads the effects of a single node going down across multiple nodes in the cluster. However, this also leads to increased operational overhead with repairs, analytics workloads, and search. Setting the appropriate number of tokens during cluster creation is imperative. Changing this value is difficult after a cluster is created.

Create a cluster with multi-tokens

This example steps you through creating a cluster with four vnodes and four (4) tokens where the number of vnodes per DSE host is less than 16.

This example works with a two datacenter cluster with four vnodes per DSE node, 3 racks per datacenter, and 18 nodes total. Modify the MissionControlCluster manifest to explicitly override the default of 16 in the config:cassandraYaml:num_tokens section. From this definition, Mission Control automatically generates initial tokens.

Required: Modify the MissionControlCluster manifest, explicitly specifying config.cassandraYaml.num_tokens: 4, as follows:

apiVersion: missioncontrol.datastax.com/v1beta2
kind: MissionControlCluster
metadata:
  name: demo
spec:
  k8ssandra:
    cassandra:
      serverVersion: 6.8.25
      serverType: dse
      config:
        cassandraYaml:
          num_tokens: 4
      datacenters:
        - metadata:
            name: dc1
          k8sContext: data-plane-1
          size: 9
          racks:
            - name: rack1
            - name: rack2
            - name: rack3
        - metadata:
            name: dc2
          k8sContext: data-plane-2
          size: 9
          racks:
            - name: rack1
            - name: rack2
            - name: rack3

With single-token or few-token clusters, always try to size the datacenter with a number that is an exact multiple of the number of racks. This facilitates the token allocation and results in a better token balance.

When using multiple racks, each rack is expected to replicate 100% of the data. Therefore, the number of racks must be equal to or greater than the replication factor (RF) in the datacenter, (RF=3 by default). Failure to comply with this requirement prevents some hosts from deploying.

Issue the following command from a pod running DSE in the cluster and review the resulting cluster:

kubectl exec demo-dc1-rack1-sts-0 -c cassandra -- nodetool -u demo-superuser -pw PASSWORD ring

Replace PASSWORD with the password for the superuser.

Result

Datacenter: dc1
==========
Address       Rack   Status State   Load        Owns    Token
                                                        8710962479251732601
10.100.7.11   rack1  Up     Normal  195.28 KiB  37.04%  -9223372036854775808
10.100.10.16  rack3  Up     Normal  224.16 KiB  33.33%  -8710962479251732915
10.100.16.17  rack2  Up     Normal  235.68 KiB  29.63%  -8540159293384051146
[...]
10.100.4.11   rack2  Up     Normal  227.99 KiB  33.33%  8198552921648689399
10.100.6.11   rack1  Up     Normal  237.61 KiB  29.63%  8369356107516371168
10.100.0.12   rack3  Up     Normal  232.3 KiB   29.63%  8710962479251732601

Datacenter: dc2
==========
Address       Rack   Status State   Load        Owns    Token
                                                        9144875253562394737
10.100.15.9   rack2  Up     Normal  208.28 KiB  33.33%  -8789459262544113986
10.100.14.7   rack1  Up     Normal  203.83 KiB  29.63%  -8618656076676432217
10.100.9.6    rack3  Up     Normal  209.17 KiB  29.63%  -8277049704941070781
[...]
10.100.9.6    rack3  Up     Normal  209.17 KiB  29.63%  8290859324223990098
10.100.17.7   rack2  Up     Normal  201.02 KiB  29.63%  8632465695959351530
10.100.11.12  rack3  Up     Normal  206.97 KiB  37.04%  9144875253562394737

When using racks and vnodes, the Owns column is understood as follows:

Although each line corresponds to a vnode, this column reports the effective data ownership for the entire host, not just the vnode, within its rack and not within the datacenter. For example, in dc1, rack1, 100% of the data is replicated; in this rack, host 10.100.7.11 contains 37.04% of that total data.

In this example of three physical hosts per rack, to get a perfectly balanced cluster, where each rack is also well balanced, the expected ownership for any physical host would be 33% of the data within the rack. Instead, note that the ownership distribution varies a little. This is because in complex cases such as this, the token allocation algorithm might not always achieve the ideal balance, but assigns an acceptable ownership variance. In this case the data ownership variance for a host is 29.63% minimum and 37.04% maximum.

Create a multi-token cluster

Create a cluster with multi-tokens

Was this helpful?

Give Feedback