Adding nodes to an existing cluster

Steps to add nodes when using virtual nodes.

Virtual nodes (vnodes) greatly simplify adding nodes to an existing cluster:

  • Calculating tokens and assigning them to each node is no longer required.
  • Rebalancing a cluster is no longer necessary because a node joining the cluster assumes responsibility for an even portion of the data.

For a detailed explanation about how vnodes work, see Virtual nodes.

Important: Simultaneously adding more than one new node violates LOCAL_QUORUM constraints. Data may stream from any replica in order to put data onto the new nodes, including other new nodes. Adding two or more nodes at the same time is possible but not recommended; it may introduce consistency issues. To assess the risks to your environment, see JIRA issues CASSANDRA-2434 and CASSANDRA-7069.
Note: If you do not use vnodes, see Adding or replacing single-token nodes.

Procedure

Be sure to install the same version of Cassandra as is installed on the other nodes in the cluster. .
  1. Install Cassandra on the new nodes, but do not start Cassandra.

    If you used the Debian install, Cassandra starts automatically. You must stop the node and clear the data.

  2. Set the following properties in the cassandra.yaml and, depending on the snitch, the cassandra-topology.properties or cassandra-rackdc.properties configuration files:
    • auto_bootstrap - This property is not listed in the default cassandra.yaml configuration file, but it might have been added and set to false by other operations. If it is not defined in cassandra.yaml, Cassandra uses true as a default value. For this operation, search for this property in the cassandra.yaml file. If it is present, set it to true or delete it..
    • cluster_name - The name of the cluster the new node is joining.
    • listen_address/broadcast_address - Can usually be left blank. Otherwise, use IP address or host name that other Cassandra nodes use to connect to the new node.
    • endpoint_snitch - The snitch Cassandra uses for locating nodes and routing requests.
    • num_tokens - The number of vnodes to assign to the node. If the hardware capabilities vary among the nodes in your cluster, you can assign a proportional number of vnodes to the larger machines.
    • seeds - Determines which nodes the new node contacts to learn about the cluster and establish the gossip process. Make sure that the -seeds list includes the address of at least one node in the existing cluster.
      Note: This new node will not bootstrap if it is listed as a seed node. Make sure the new node's address is not listed in the -seeds list. For more information about seed nodes, see Internode communications (gossip).

      To add the new node as a seed node, complete these steps, then go on to Promoting a new node to a seed node.

    • Check the cassandra.yaml file and cassandra-topology.properties or cassandra-rackdc.properties files in other nodes in the cluster for any non-default settings, and make sure to replicate these settings on the new node.
      Note: Use the diff command to find and merge (by head) any differences between existing and new nodes.
    The location of the cassandra-topology.properties file depends on the type of installation:
    Package installations /etc/cassandra/cassandra-topology.properties
    Tarball installations install_location/conf/cassandra-topology.properties
    The location of the cassandra-rackdc.properties file depends on the type of installation:
    Package installations /etc/cassandra/cassandra-rackdc.properties
    Tarball installations install_location/conf/cassandra-rackdc.properties
    The location of the cassandra.yaml file depends on the type of installation:
    Package installations /etc/cassandra/cassandra.yaml
    Tarball installations install_location/resources/cassandra/conf/cassandra.yaml
    Warning: Simultaneously bootstrapping more than one new node from the same rack, violates LOCAL_QUORUM constraints. Data may stream from any replica in order to put data onto the new nodes, including other new nodes. Adding two or more nodes at the same time is possible but not recommended; it may introduce consistency issues. To assess the risks to your environment, see JIRA issues CASSANDRA-2434 and CASSANDRA-7069.

    If you are adding two or more nodes, configure each node as in the previous steps. Then go to Starting multiple new nodes for additional steps you must take.

  3. Start the single node:
  4. Use nodetool status to verify that the node is fully bootstrapped and all other nodes are up (UN) and not in any other state.
  5. After all new nodes are running, run nodetool cleanup on each of the previously existing nodes to remove the keys that no longer belong to those nodes. Wait for cleanup to complete on one node before running nodetool cleanup on the next node.

    Cleanup can be safely postponed for low-usage hours.

What's next

Starting multiple new nodes

If you have added more than one node:

  • Make sure you start each node with consistent.rangemovement property turned off:
    Package installations
    On each of the nodes you are bootstrapping, add the following option to the /usr/share/cassandra/cassandra-env.sh file:
    JVM_OPTS="$JVM_OPTS -Dcassandra.consistent.rangemovement=false

    Then start Cassandra as a service.

    Tarball installations
    Start Cassandra on each of the nodes you are bootstrapping with this option:
    $ bin/cassandra -Dcassandra.consistent.rangemovement=false
  • Allow two minutes between node startups.
  • After each new node has bootstrapped, turn consistent range movement back on for each one:
    Package installations
    Stop Cassandra and remove the line you added to /usr/share/cassandra/cassandra-env.sh in the previous step:
    JVM_OPTS="$JVM_OPTS -Dcassandra.consistent.rangemovement=false

    Then restart Cassandra.

    Tarball installations
    Stop Cassandra, then restart with this option:
    $ bin/cassandra -Dcassandra.consistent.rangemovement=false
  • After restarting the nodes, go to back to step 4 above to verify the new nodes.

What's next

Promoting a node as a seed node

A seed node does not bootstrap, so a new node can't be configured as one immediately. After you have bootstrapped new nodes in the cluster, follow these steps for each one you want to promote as a seed node.

Note: Do not promote every node in a cluster as a seed node.
  1. Stop Cassandra on the node you want to promote.
  2. Open the node's cassandra.yaml file and add the node's address to the seed_provider list.
  3. Make this change on all other nodes in the cluster.
  4. Start Cassandra as a service or a stand-alone process.