Add single-token nodes to a cluster

Steps for adding nodes in single-token architecture clusters, not clusters using Virtual nodes.

To add capacity to a cluster, introduce new nodes in stages or by adding an entire datacenter. Use one of the following methods:

Add capacity by doubling the cluster size: Adding capacity by doubling (or tripling or quadrupling) the number of nodes is less complicated when assigning tokens. Using this method, existing nodes keep their existing token assignments, and the new nodes are assigned tokens that bisect (or trisect) the existing token ranges.
Add capacity for a non-uniform number of nodes: When increasing capacity with this method, you must recalculate tokens for the entire cluster, and assign the new tokens to the existing nodes.

Only add new nodes to the cluster. A new node is a system that HCD has never started.

The node must have absolutely NO PREVIOUS DATA in the data/, saved_caches/, commitlog/, and hints/ subdirectories.

Adding nodes previously used for testing, or that have been removed from another cluster, merges the older data and its incompatible schema into the cluster and may cause data loss or corruption.

Procedure

Install and configure HCD on each new node.
If HCD starts automatically, stop HCD and clear the data.

Configure cassandra.yaml on each new node:

auto_bootstrap: If false, set it to true.

This option is not listed in the default cassandra.yaml configuration file and defaults to true.
cluster_name: Set to the same cluster name as existing nodes.
listen_address / broadcast_address: Usually leave blank. Otherwise, use the IP address or host name that other nodes use to connect to the new node.
endpoint_snitch: Set to the same snitch as existing nodes. GossipingPropertyFileSnitch is almost always the correct choice.

initial_token: Set according to your token calculations.

If this property has no value, the database assigns the node a random token range and results in an unbalanced ring where nodes do not own [roughly] the same amount of data so load is not shared equally among nodes.

seeds: Set to the same seeds list as existing nodes.

Seed nodes cannot bootstrap. Do not include the new node in its own seeds list.

Do not make all nodes seed nodes. See Internode communications (gossip).
Change any other non-default settings in the new nodes to match the existing nodes. Use the diff command to find and merge any differences between the nodes.

Depending on the snitch, assign the datacenter and rack names in the cassandra-topology.properties or cassandra-rackdc.properties for each node.
Start HCD on each new node in two-minute intervals with consistent.rangemovement system property disabled:
- Package installations: To each bootstrapped node, add the following option to the jvm11-server.options file and then start HCD:
  -Dcassandra.consistent.rangemovement=false
- Tarball installations:
  bin/cassandra -Dcassandra.consistent.rangemovement=false
  The following operations are resource intensive and should be done during low-usage times.
After the new nodes are fully bootstrapped, use nodetool move to assign the new initial_token value to each node that requires one, one node at a time.
After all nodes have their new tokens assigned, run nodetool cleanup on each node in the cluster and wait for cleanup to complete on each node before doing the next node.

This step removes the keys that no longer belong to the previously existing nodes.

Failure to run nodetool cleanup after adding a node may result in data inconsistencies including resurrection of previously deleted data.

Add single-token nodes to a cluster

Procedure

Was this helpful?

Give Feedback