Virtual node (vnode) configuration

Virtual nodes simplify many tasks in DSE, such as eliminating the need to determine the partition range (calculate and assign tokens), rebalancing the cluster when adding or removing nodes, and replacing dead nodes. For a complete description of virtual nodes and how they work, see Virtual nodes.

DSE requires the same token architecture on all nodes in a datacenter. The nodes must all be vnode-enabled or single-token architecture. Across the entire cluster, datacenter architecture can vary. For example, a single cluster with:

  • A transaction-only datacenter running OLTP.

  • A single-token architecture search datacenter (no vnodes).

  • An analytics datacenter with vnodes.

Guidelines for using virtual nodes

  • DSE requires the same token architecture on all nodes in a datacenter.

    The nodes must all be vnode-enabled or single-token architecture. Across the entire cluster, datacenter architecture can vary.

    For example, a single cluster with:

    • A transaction-only datacenter running OLTP.

    • A single-token architecture search datacenter (no vnodes).

    • An analytics datacenter with vnodes.

  • DataStax recommends using 8 vnodes (tokens).

    Restriction: DataStax recommends not using vnodes with DSE Search. However, if you decide to use vnodes with DSE Search, do not use more than 8 vnodes and ensure that allocate_tokens_for_local_replication_factor option in cassandra.yaml is correctly configured for your environment.

    Using 8 vnodes distributes the workload between systems with a ~10% variance and has minimal impact on performance.

  • Ensure correct vnode configuration with cassandra.yaml settings:

    • When adding a vnode to an existing cluster or setting up nodes in a new datacenter, set the target replication factor (RF) of keyspaces in the datacenter with the allocate_tokens_for_local_replication_factor option.

    • The allocation algorithm distributes the token ranges proportionately using the num_tokens settings.

      All systems in the datacenter should have the same num_token settings unless the systems performance varies between systems. To distribute more of the workload to the higher performance hardware, increase the number of tokens for those systems.

      The allocation algorithm efficiently balances the workload using fewer tokens; when systems are added to a datacenter, the algorithm maintains the balance. Using a higher number of tokens more evenly distributes the workload, but also significantly increases token management overhead.

      Set the number of vnode tokens based on the workload distribution requirements of the datacenter:

      Replication factor 4 vnode (tokens) 8 vnode (tokens) 64 vnode (tokens) 128 vnode (tokens)

      2

      ~17.5%

      ~12.5%

      ~3%

      ~1%

      3

      ~14%

      ~10%

      ~2%

      ~1%

      5

      ~11%

      ~7%

      ~1%

      ~1%

  • Add nodes to the cluster one at a time.

    When adding multiple nodes to the cluster using the allocation algorithm, ensure that nodes are added one at a time. If nodes are added concurrently, the algorithm assigns the same tokens to different nodes.

Enabling vnodes

In the cassandra.yaml file:

  1. Uncomment num_tokens and set the required number of tokens.

  2. (Recommended) To use the allocation algorithm uncomment allocate_tokens_for_local_replication_factor and set it to the target replication factor for the keyspaces in the datacenter. If the replication varies, alternate between the replication factor (RF) settings.

  3. Comment out the initial_token or leave unset.

To upgrade existing clusters to vnodes, see Enabling virtual nodes on an existing production cluster.

Disabling vnodes

If you do not use vnodes, you must make sure that each node is responsible for roughly an equal amount of data. To ensure that each node is responsible for an equal amount of data, assign each node an initial-token value and calculate the tokens for each datacenter as described in Generating tokens.

  1. In the cassandra.yaml file:

    1. Comment out the num_tokens and allocate_tokens_for_local_replication_factor.

    2. Uncomment the initial_token and set it to 1 or to the value of a generated token for a multi-node cluster.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com