Create a Multi-token DSE Cluster

Availability requirements might dictate the need to create a cluster with a low number of virtual nodes (vnodes) per node and a token per node. Use Mission Control to assist in assigning tokens as you create a cluster and to help assure a more balanced cluster. A large number of tokens spreads the effects of a single node going down across multiple nodes in the cluster. However, this also leads to increased operational overhead with repairs, analytics workloads, and search. Setting the appropriate number of tokens during cluster creation is imperative. Changing this value is difficult after a cluster is created.

Create a cluster with multi-tokens

This example steps you through creating a cluster with four vnodes and four (4) tokens where the number of vnodes per DSE host is less than 16.

This example works with a two datacenter cluster with four vnodes per DSE node, 3 racks per datacenter, and 18 nodes total. Modify the MissionControlCluster manifest to explicitly override the default of 16 in the config:cassandraYaml:num_tokens section. From this definition, Mission Control automatically generates initial tokens.

  1. Required: Modify the MissionControlCluster manifest, explicitly specifying config.cassandraYaml.num_tokens: 4, as follows:

    apiVersion: missioncontrol.datastax.com/v1beta1
    kind: MissionControlCluster
    metadata:
      name: demo
    spec:
      k8ssandra:
        cassandra:
          serverVersion: 6.8.25
          serverType: dse
          config:
            cassandraYaml:
              num_tokens: 4
          datacenters:
            - metadata:
                name: dc1
              k8sContext: data-plane-1
              size: 9
              racks:
                - name: rack1
                - name: rack2
                - name: rack3
            - metadata:
                name: dc2
              k8sContext: data-plane-2
              size: 9
              racks:
                - name: rack1
                - name: rack2
                - name: rack3

    With single-token or few-token clusters, always try to size the datacenter with a number that is an exact multiple of the number of racks. This facilitates the token allocation and results in a better token balance.

    When using multiple racks, each rack is expected to replicate 100% of the data. Therefore, the number of racks must be equal to or greater than the replication factor (RF) in the datacenter, (RF=3 by default). Failure to comply with this requirement prevents some hosts from deploying.

  2. Issue the following command from a pod running DSE in the cluster and review the resulting cluster:

    kubectl exec demo-dc1-rack1-sts-0 -c cassandra -- nodetool -u demo-superuser -pw <omitted> ring
    Sample results
    Datacenter: dc1
    ==========
    Address       Rack   Status State   Load        Owns    Token
                                                            8710962479251732601
    10.100.7.11   rack1  Up     Normal  195.28 KiB  37.04%  -9223372036854775808
    10.100.10.16  rack3  Up     Normal  224.16 KiB  33.33%  -8710962479251732915
    10.100.16.17  rack2  Up     Normal  235.68 KiB  29.63%  -8540159293384051146
    [...]
    10.100.4.11   rack2  Up     Normal  227.99 KiB  33.33%  8198552921648689399
    10.100.6.11   rack1  Up     Normal  237.61 KiB  29.63%  8369356107516371168
    10.100.0.12   rack3  Up     Normal  232.3 KiB   29.63%  8710962479251732601
    
    Datacenter: dc2
    ==========
    Address       Rack   Status State   Load        Owns    Token
                                                            9144875253562394737
    10.100.15.9   rack2  Up     Normal  208.28 KiB  33.33%  -8789459262544113986
    10.100.14.7   rack1  Up     Normal  203.83 KiB  29.63%  -8618656076676432217
    10.100.9.6    rack3  Up     Normal  209.17 KiB  29.63%  -8277049704941070781
    [...]
    10.100.9.6    rack3  Up     Normal  209.17 KiB  29.63%  8290859324223990098
    10.100.17.7   rack2  Up     Normal  201.02 KiB  29.63%  8632465695959351530
    10.100.11.12  rack3  Up     Normal  206.97 KiB  37.04%  9144875253562394737

    When using racks and vnodes, the Owns column is understood as follows:

    Although each line corresponds to a vnode, this column reports the effective data ownership for the entire host (and not just the vnode), within its rack (and not within the datacenter). For example, in dc1, rack1, 100% of the data is replicated; in this rack, host 10.100.7.11 contains 37.04% of that total data.

In this example of three (3) physical hosts per rack, in order to get a perfectly balanced cluster, where each rack is also well balanced, the expected ownership for any physical host would be 33% of the data within the rack. Instead, note that the ownership distribution varies a little. This is because in complex cases such as this, the token allocation algorithm may not always achieve the ideal balance, but assigns an acceptible ownership variance. In this case the data ownership variance for a host is 29.63% minimum and 37.04% maximum.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com