Adding a datacenter to a cluster

Steps for adding a datacenter to an existing cluster.

Steps for adding a datacenter to an existing cluster.

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/cassandra/cassandra.yaml
Tarball installations install_location/resources/cassandra/conf/cassandra.yaml

Procedure

Be sure to install the same version of Cassandra as is installed on the other nodes in the cluster.
  1. Ensure that you are using NetworkTopologyStrategy for all of your keyspaces.
  2. For each node, set the following properties in the cassandra.yaml file:
    1. Add (or edit) auto_bootstrap: false.

      By default, this setting is true and not listed in the cassandra.yaml file. Setting this parameter to false prevents the new nodes from attempting to get all the data from the other nodes in the datacenter. When you run nodetool rebuild in the last step, each node is properly mapped.

    2. Set other properties, such as -seeds and endpoint_snitch, to match the cluster settings.

      For more guidance, see Initializing a multiple node cluster (multiple datacenters).

      Note: Do not make all nodes seeds, see Internode communications (gossip).
    3. If you want to enable vnodes, set num_tokens.
      Important: Do not set the initial_token parameter.
  3. Update the relevant property file for snitch used on all servers to include the new nodes. You do not need to restart.
  4. Ensure that your clients are configured correctly for the new cluster:
    • If your client uses the DataStax Java, C#, or Python driver, set the load-balancing policy to DCAwareRoundRobinPolicy See the API documentation in the relevant documentation.
    • If you are using another client such as Hector, make sure it does not auto-detect the new nodes so that they aren't contacted by the client until explicitly directed. For example if you are using Hector, use sethostConfig.setAutoDiscoverHosts(false);. If you are using Astyanax, use ConnectionPoolConfigurationImpl.setLocalDatacenter("<datacenter name">) to ensure you are connecting to the specified datacenter.
    • If you are using Astyanax 2.x, with integration with the DataStax Java Driver 2.0, you can set the load-balancing policy to DCAwareRoundRobinPolicy by calling JavaDriverConfigBuilder.withLoadBalancingPolicy().
      AstyanaxContext<Keyspace> context = new AstyanaxContext.Builder()
          ...
          .withConnectionPoolConfiguration(new JavaDriverConfigBuilder()
              .withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy()))
              .build())
          ...
  5. If using a QUORUM consistency level for reads or writes, check the LOCAL_QUORUM or EACH_QUORUM consistency level to see if the level meets your requirements for multiple datacenters.
  6. Start Cassandra on the new nodes.
  7. After all nodes are running in the cluster:
    1. Change the keyspace properties to specify the desired replication factor for the new datacenter.

      For example, set strategy options to DC1:2, DC2:2.

      For more information, see ALTER KEYSPACE.

    2. Run nodetool rebuild specifying the existing datacenter on all nodes in the new datacenter:
      nodetool rebuild -- name_of_existing_data_center

      Otherwise, requests to the new datacenter with LOCAL_ONE or ONE consistency levels may fail if the existing datacenters are not completely in-sync.

      You can run rebuild on one or more nodes at the same time. The choices depends on whether your cluster can handle the extra IO and network pressure of running on multiple nodes. Running on one node at a time has the least impact on the existing cluster.

      Attention: If you don't specify the existing datacenter in the command line, the new nodes will appear to rebuild successfully, but will not contain any data.
  8. Change to true or remove auto_bootstrap: false in the cassandra.yaml file.

    Returns this parameter to its normal setting so the nodes can get all the data from the other nodes in the datacenter if restarted.