Virtual node (vnode) configuration 

A description of virtual nodes (vnodes) and using them in different types of datacenters. Also steps for disabling vnodes.

Virtual nodes simplify many tasks in DataStax Enterprise, such as eliminating the need to determine the partition range (calculate and assign tokens), rebalancing the cluster when adding or removing nodes, and replacing dead nodes. For a complete description of virtual nodes and how they work, see Virtual nodes.

DataStax Enterprise only supports using the same token architecture on all nodes in a datacenter, that is the nodes must either be vnode-enabled or single-token architecture. Across the entire cluster, datacenter architecture can vary. For example, a single cluster with:
  • A transaction-only datacenter running OLTP.
  • A single-token architecture analytics datacenter (no vnodes).
  • A search datacenter with vnodes.

Guidelines for using virtual nodes 

Whether virtual nodes (vnodes) are enabled or disabled depends on the initial cassandra.yaml settings. There are two methods of distributing token ranges, add all systems in the datacenter using the same method:
  • Allocation algorithm: Optimizes token range distribution between nodes and racks in the datacenter based on the keyspace replication factor (allocate_tokens_for_local_replication_factor) of the datacenter. Distributes the token ranges proportionately using the num_tokens settings. All systems in the datacenter should have the same num_token settings unless the systems performance varies between systems. To distribute more of the workload to the higher performance hardware, increase the number of tokens for those systems.

    The allocation algorithm efficiently balances the workload using fewer tokens; when systems are added to a datacenter, the algorithm maintains the balance. Using a higher number of tokens more evenly distributes the workload, but also significantly increases token management overhead.

    DataStax recommends using 8 vnode (tokens). This distributes the workload between systems with a ~10% variance and has minimal impact on performance. Set the number of vnode tokens (num_tokens) based on the workload distribution requirements of the datacenter:
    Allocation algorithm workload distribution variance
    Replication factor 4 vnode (tokens) 8 vnode (tokens) 64 vnode (tokens) 128 vnode (tokens)
    2 ~17.5% ~12.5% ~3% ~1%
    3 ~14% ~10% ~2% ~1%
    5 ~11% ~7% ~1% ~1%
  • Random selection algorithm: Distributes the token ranges randomly to nodes within the datacenter. Enabled when only the num_tokens setting is specified and the allocate_tokens_for_local_replication_factor is commented out. Distribution is proportionate based on the number of tokens assigned to other nodes in the datacenter. Assign the number of tokens based on the type of system as follows:
    Datacenter type Number of vnodes (tokens)
    Transactional-only 128
    DSE Analytics-only 128

    For SearchAnalytics workloads, use the DSE Search recommendation of 8 vnodes.

    DSE Search-only 8
    DSE Graph 128
    DSE Graph when using with DSE Search 16 or 32
    Note: When the datacenter is first created the load is evenly distributed. The workload might become unbalanced as the topography of the datacenter changes when nodes are added or removed.

Enabling vnodes 

In the cassandra.yaml file:

  1. Uncomment num_tokens and set the required number of tokens.
  2. If necessary, comment out the initial_token parameter or leave unset.

To upgrade existing clusters to vnodes, see Enabling virtual nodes on an existing production cluster.

The location of the cassandra.yaml file depends on the type of installation:
Installer-Services /etc/dse/cassandra/cassandra.yaml
Package installations /etc/dse/cassandra/cassandra.yaml
Installer-No Services install_location/resources/cassandra/conf/cassandra.yaml
Tarball installations install_location/resources/cassandra/conf/cassandra.yaml

Disabling vnodes 

Important: If you do not use vnodes, you must make sure that each node is responsible for roughly an equal amount of data. To ensure that each node is responsible for an equal amount of data, assign each node an initial-token value and calculate the tokens for each datacenter as described in Generating tokens.

Procedure

In the cassandra.yaml file:

  1. Comment out the allocate_tokens_for_local_replication_factor and num_tokens options.
  2. Uncomment the initial_token option and set it to 1 or to the value of a generated token for a multi-node cluster.
  3. In the cassandra.yaml file: