Adding nodes to an existing cluster

Steps to add nodes when using virtual nodes.

Virtual nodes (vnodes) greatly simplify adding nodes to an existing cluster:

  • Calculating tokens and assigning them to each node is no longer required.
  • Rebalancing a cluster is no longer necessary because a node joining the cluster assumes responsibility for an even portion of the data.

For a detailed explanation about how this works, see Virtual nodes.

Note: If you do not use vnodes, follow the instructions in the 1.1 topic Adding Capacity to an Existing Cluster.

Procedure

  1. Install Cassandra on the new nodes, but do not start Cassandra.

    If you used a packaged install, Cassandra starts automatically and you must and stop the node and clear the data.

  2. Set the following properties in the cassandra.yaml and cassandra-topology.properties configuration files:
    • auto_bootstrap - If this option has been set to false, you must set it to true. This option is not listed in the default cassandra.yaml file and is set to true by default.
    • cluster_name - the name of the cluster the new node is joining.
    • listen_address/broadcast_address - the IP address or host name that other Cassandra nodes use to connect to the new node.
    • endpoint_snitch -the snitch Cassandra uses for locating nodes and routing requests.
    • num_tokens -the number of vnodes to assign to the node. If the hardware capabilities vary among the nodes in your cluster, you can assign a proportional number of vnodes to the larger machines.
    • seed_provider - the - seeds list in this setting determines which nodes the new node should contact to learn about the cluster and establish the gossip process. Change other non-default settings you have made to your existing cluster in the cassandra.yaml file and cassandra-topology.properties files. Use the diff command to find and merge (by head) any differences between existing and new nodes.
  3. Start Cassandra on each new node. Allow two minutes between node initializations. You can monitor the startup and data streaming process using nodetool netstats.
  4. After all new nodes are running, run nodetool cleanup on each of the previously existing nodes to remove the keys no longer belonging to those nodes. Wait for cleanup to complete on one node before doing the next.

    Cleanup may be safely postponed for low-usage hours.