Adding a datacenter to a cluster

Steps for adding a datacenter to an existing cluster.

Steps for adding a datacenter to an existing cluster.

The location of the cassandra.yaml file depends on the type of installation:
Cassandra package installations /etc/cassandra/cassandra.yaml
Cassandra tarball installations install_location/cassandra/conf/cassandra.yaml

Prerequisites

If the new datacenter will use existing nodes from another datacenter or cluster, ensure that old data won't interfere with the new cluster:
  1. Decommission each node that will be added to the new datacenter. See Removing a node.
  2. Completely remove the application directories. See Clearing the data as a stand-alone process or Clearing the data as a service.
  3. After removal, install Cassandra from scratch.

Procedure

  1. To prevent the client from prematurely connecting to the new datacenter and to ensure that the consistency level for reads or writes does not query the new datacenter:
    1. Make sure that the clients are configured to use the DCAwareRoundRobinPolicy.
    2. Make sure that the clients point to an existing datacenter, so they don't try to access the new datacenter, which may not have any data.
    3. If using a QUORUM consistency level, change to LOCAL_QUORUM.
    4. If using the ONE consistency level, set to LOCAL_ONE.

    See the programming instructions for your driver.

    Warning: If client applications, are not properly configured, they may connect to the new datacenter before the datacenter is ready. This results in connection exceptions, timeouts, and/or inconsistent data.
  2. Configure the keyspace and create the new datacenter:
    1. Use ALTER KEYSPACE to use the NetworkTopologyStrategy for the following keyspaces:
      • All user-created
      • system: system_distributed and system_traces

      This step is required for multiple datacenter clusters because nodetool rebuild (10) requires a replica of these keyspaces in the specified source datacenter.

    2. Ensure that the defined class for all datacenters is NetworkTopologyStrategy.

      You can use cqlsh to create or alter a keyspace:

      ALTER KEYSPACE "sample_ks" WITH REPLICATION =
        { 'class' : 'NetworkTopologyStrategy', 'ExistingDC' : 3 };
      Note: Datacenter names are case sensitive. Verify the case of the using utility, such as nodetool status.
  3. In the new datacenter, install Cassandra on each new node. Do not start the service or restart the node.
    Be sure to use the same version of Cassandra on all nodes in the cluster.
  4. Configure cassandra.yaml on each new node following the configuration of the other nodes in the cluster:
    1. Set other cassandra.yaml properties, such as -seeds and endpoint_snitch, to match the settings in the cassandra.yaml files on other nodes in the cluster.
      Properties to set:
      • cluster_name:
      • num_tokens: recommended value: 256
      • -seeds: internal IP address of each seed node

        In new clusters. Seed nodes don't perform bootstrap (the process of a new node joining an existing cluster.)

      • listen_address:

        If the node is a seed node, this address must match an IP address in the seeds list. Otherwise, gossip communication fails because it doesn't know that it is a seed.

        If not set, Cassandra asks the system for the local address, the one associated with its hostname. In some cases Cassandra doesn't produce the correct address and you must specify the listen_address.

      • rpc_address:listen address for client connections
      • endpoint_snitch: name of snitch (See endpoint_snitch.) If you are changing snitches, see Switching snitches.
      Note: Do not make all nodes seeds, see Internode communications (gossip).
    2. Use the following settings to configure the vnode token allocation:
      Note: If using single-token architecture, see Generating tokens and Adding or replacing single-token nodes.
  5. On each new node, add the new datacenter definition to the properties file for the type of snitches used in the cluster:
    Note: Do not use the SimpleSnitch. The SimpleSnitch (default) is used only for single-datacenter deployments. It does not recognize datacenter or rack information and can be used only for single-datacenter deployments or single-zone in public clouds.
    Table 1. Configuration file per snitch
    Snitch Configuring file
    PropertyFileSnitch cassandra-topology.properties
    GossipPropertyFileSnitch cassandra-rackdc.properties
    Ec2Snitch
    Ec2MultiRegionSnitch
    GoogleCloudSnitch
  6. In the existing datacenters:
    1. On some nodes, update the seeds property in the cassandra.yaml file to include the seed nodes in the new datacenter and restart those nodes. (Changes to the cassandra.yaml file require restarting to take effect.)
    2. Add the new datacenter definition to the properties file for the type of snitch used in the cluster (5). If changing snitches, see Switching snitches.
  7. Start Cassandra on one node on each rack.
  8. Rotate starting Cassandra through the racks until all the nodes are up.
  9. After all nodes are running in the cluster and the client applications are datacenter aware (1), use cqlsh to alter the keyspaces:
    ALTER KEYSPACE "sample_ks" WITH REPLICATION =
    	{'class’: 'NetworkTopologyStrategy', 'ExistingDC':3, 'NewDC':3};
    Warning: If client applications, are not properly configured, they may connect to the new datacenter before the datacenter is ready. This results in connection exceptions, timeouts, and/or inconsistent data.
  10. Run nodetool rebuild on each node in the new datacenter.
    nodetool rebuild -- name_of_existing_data_center
    CAUTION: If you don't specify the existing datacenter in the command line, the new nodes will appear to rebuild successfully, but will not contain any data.

    If you miss this step, requests to the new datacenter with LOCAL_ONE or ONE consistency levels may fail if the existing datacenters are not completely in-sync.

    This step ensures that the new nodes recognize the existing datacenters in the cluster.

    You can run rebuild on one or more nodes at the same time. Run on one node at a time to reduce the impact on the existing cluster. Run on multiple nodes when the cluster can handle the extra I/O and network pressure.

Results

The datacenters in the cluster are now replicating with each other.
The location of the cassandra.yaml file depends on the type of installation:
Cassandra package installations /etc/cassandra/cassandra.yaml
Cassandra tarball installations install_location/cassandra/conf/cassandra.yaml