Adding a datacenter to a cluster
Steps for adding a datacenter to an existing cluster.
Steps for adding a datacenter to an existing cluster.
Package installations | /etc/cassandra/cassandra.yaml |
Tarball installations | install_location/resources/cassandra/conf/cassandra.yaml |
Procedure
-
Ensure that old data files won't interfere with the new cluster:
- Remove any existing (running) nodes in a different cluster or datacenter first that contain old data.
- Properly clean these nodes.
- Completely remove the application directories (recommended).
- After removal, install Cassandra or DataStax Enterprise from scratch.
-
Configure the keyspace and create the new datacenter:
-
Use ALTER KEYSPACE to use the
NetworkTopologyStrategy for the following keyspaces:
- All user-created
- system:
system_distributed
,system_auth
,system_traces
- DataStax Enterprise defined: dse_perf, dse_security, dse_leases
- OpsCenter (if installed)
This step is required for multiple datacenter clusters because nodetool rebuild (11) requires a replica of these keyspaces in the specified source datacenter.
-
Create a new datacenter with a replication factor of zero.
You can use cqlsh to create or alter a keyspace:
CREATE KEYSPACE "sample-ks" WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'ExistingDC' : 3 , 'NewDC' : 0};
-
Use ALTER KEYSPACE to use the
NetworkTopologyStrategy for the following keyspaces:
-
In the new datacenter, install Cassandra on each new node. Do not start the
service or restart the node.
Be sure to use the same version of Cassandra on all nodes in the cluster. See Installing earlier releases.
-
Configure cassandra.yaml on each new node
following the configuration of the other nodes in the cluster:
-
Set other cassandra.yaml properties, such as
-seeds and endpoint_snitch, to
match the settings in the cassandra.yaml files on
other nodes in the cluster. See Initializing a multiple node cluster (multiple datacenters).
Note: Do not make all nodes seeds, see Internode communications (gossip).
-
If using vnodes, set num_tokens on each node.
The recommended value is 256. Do not set the initial_token. DataStax Enterprise uses other token values.
- If using single-node-per-token architecture, generate the initial token for each node. Add this as the value for each node's initial_token property, and make sure the num_tokens is commented out.
-
Set other cassandra.yaml properties, such as
-seeds and endpoint_snitch, to
match the settings in the cassandra.yaml files on
other nodes in the cluster. See Initializing a multiple node cluster (multiple datacenters).
-
On each new node, add the new datacenter definition to the properties file for
the type of snitches used in the
cluster:
Note: Do not use the SimpleSnitch. The SimpleSnitch (default) is used only for single-datacenter deployments. It does not recognize datacenter or rack information and can be used only for single-datacenter deployments or single-zone in public clouds.
Configuration file per snitch Snitch Configuring file PropertyFileSnitch cassandra-topology.properties GossipPropertyFileSnitch cassandra-rackdc.properties Ec2Snitch Ec2MultiRegionSnitch GoogleCloudSnitch -
In the existing datacenters:
- On some nodes, update the seeds property in the cassandra.yaml file to include the seed nodes in the new datacenter and restart those nodes. (Changes to the cassandra.yaml file require restarting to take effect.)
- Add the new datacenter definition to the properties file for the type of snitch used in the cluster (5). If changing snitches, see Switching snitches.
-
To avoid the client prematurely connecting to the new datacenter:
-
Make sure the clients are configured to use the
DCAwareRoundRobinPolicy
. - Be sure that your client point to the existing datacenter, so it's not trying to access the new datacenter, which may not have any data.
See the programming instructions for your driver.
-
Make sure the clients are configured to use the
-
To ensure you are not using a consistency level for reads or writes that
queries the new datacenter, review the consistency level for global or per-operation level for multiple
datacenter operation:
-
If using a
QUORUM
consistency level, change toLOCAL_QUORUM
. -
If using the
ONE
consistency level, set toLOCAL_ONE
.
Warning: If client applications, including DSE Search and DSE Analytics, are not properly configured, they may connect to the new datacenter before the datacenter is ready. This results in connection exceptions, timeouts, and/or inconsistent data. -
If using a
- Start Cassandra on the new nodes.
-
After all nodes are running in the cluster and the client applications are
datacenter aware (7), use cqlsh to alter the keyspaces:
ALTER KEYSPACE "sample-ks" WITH REPLICATION = {'class’: 'NetworkTopologyStrategy', 'ExistingDC':3, 'NewDC':3};
Warning: If client applications, including DSE Search and DSE Analytics, are not properly configured, they may connect to the new datacenter before the datacenter is ready. This results in connection exceptions, timeouts, and/or inconsistent data. -
Run nodetool
rebuild on each node in the new datacenter.
$ nodetool rebuild -- name_of_existing_data_center
CAUTION:If you don't specify the existing datacenter in the command line, the new nodes will appear to rebuild successfully, but will not contain any data.If you miss this step, requests to the new datacenter with LOCAL_ONE or ONE consistency levels may fail if the existing datacenters are not completely in-sync.
This step ensures that the new nodes recognize the existing datacenters in the cluster.
You can run rebuild on one or more nodes at the same time. Run on one node at a time to reduce the impact on the existing cluster. Run on multiple nodes when the cluster can handle the extra I/O and network pressure.
Results
Package installations | /etc/cassandra/cassandra.yaml |
Tarball installations | install_location/resources/cassandra/conf/cassandra.yaml |