Initialize a single datacenter per workload type
In this scenario, a mixed workload cluster has only one datacenter for each type of workload. For example, an eight-node cluster with the following nodes would use three datacenters, one for each workload type:
-
DC1 = 3 DSE Analytics nodes
-
DC2 = 3 Transactional nodes
-
DC3 = 2 DSE Search nodes
In contrast, a multiple datacenter cluster has more than one datacenter for each type of workload.
The eight-node cluster spans two racks across three datacenters.
Applications in each datacenter will use a default consistency level of LOCAL_QUORUM.
One node per rack will serve as a seed node.
| Node | IP address | Type | Seed | Rack |
|---|---|---|---|---|
node0 |
110.82.155.0 |
Transactional |
✓ |
RAC1 |
node1 |
110.82.155.1 |
Transactional |
RAC1 |
|
node2 |
110.54.125.1 |
Transactional |
RAC2 |
|
node3 |
110.54.125.2 |
Analytics |
RAC1 |
|
node4 |
110.54.155.2 |
Analytics |
✓ |
RAC2 |
node5 |
110.82.155.3 |
Analytics |
RAC1 |
|
node6 |
110.54.125.3 |
Search |
RAC1 |
|
node7 |
110.82.155.4 |
Search |
RAC2 |
Prerequisites
-
Complete the prerequisite tasks outlined in Initialize a DataStax Enterprise cluster to prepare the environment.
-
If the new datacenter uses existing nodes from another datacenter or cluster, ensure that old data won’t interfere with the new cluster:
-
If the nodes are behind a firewall, open the required ports for internal/external communication.
-
Decommission each node that will be added to the new datacenter.
-
Clear the data from DSE to completely remove application directories.
-
Install DSE on each node but don’t start the DSE service.
-
Procedure
-
Configure client applications so they don’t prematurely connect to the new datacenter, and ensure that the consistency level for reads or writes doesn’t query the new datacenter.
If client applications, including DSE Search and DSE Analytics, aren’t properly configured, they might connect to the new datacenter before it is online. Incorrect configuration results in connection exceptions, timeouts, and/or inconsistent data.
-
Configure client applications to use the
DCAwareRoundRobinPolicy. -
Direct clients to an existing datacenter. Otherwise, clients might try to access the new datacenter, which might not have any data.
-
If using the
QUORUMconsistency level, change toLOCAL_QUORUM. -
If using the
ONEconsistency level, set toLOCAL_ONE.
For more information, see the language-specific documentation for your DataStax-compatible driver.
-
-
If your existing datacenters use the
SimpleStrategyreplication strategy, change it to theNetworkTopologyStrategyreplication strategy:-
Use
ALTER KEYSPACEto change the keyspace replication strategy toNetworkTopologyStrategyfor the following keyspaces:-
DSE security:
system_auth,dse_security -
DSE performance:
dse_perf -
DSE analytics:
dse_leases, dsefs -
System resources:
system_traces,system_distributed -
OpsCenter keyspace (if installed)
-
All keyspaces created by users
For example:
ALTER KEYSPACE keyspace_name WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'DC1' : 3};
-
-
Use
DESCRIBE SCHEMAto check the replication strategy of keyspaces in the cluster. Ensure that any existing keyspaces use theNetworkTopologyStrategyreplication strategy.DESCRIBE SCHEMA;Result
CREATE KEYSPACE dse_perf WITH replication = {'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true; ... CREATE KEYSPACE dse_leases WITH replication = {'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true; ... CREATE KEYSPACE dsefs WITH replication = {'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true; ... CREATE KEYSPACE dse_security WITH replication = {'class': 'NetworkTopologyStrategy, 'DC1': '3'} AND durable_writes = true;
-
-
Install DSE on each node in the new datacenter.
Don’t start the service or restart the node.
Use the same version of DataStax Enterprise (DSE) on all nodes in the cluster.
-
Configure properties in
cassandra.yamlon each new node, following the configuration of the other nodes in the cluster.If you used Lifecycle Manager to provision the nodes, configuration is performed automatically.
For manual configuration, use the
yaml_difftool to review and make appropriate changes to thecassandra.yamlanddse.yamlconfiguration files.-
Configure node properties:
-
-seeds: The internal IP address of each seed node.Include at least one seed node from each datacenter. DataStax recommends more than one seed node per datacenter, in more than one rack.
3is the most common number of seed nodes per datacenter. Do not make all nodes seed nodes. -
auto_bootstrap: This setting has been removed from the default configuration, but, if present, should be set totrue. -
cluster_name: On the new datacenter nodes, thecluster_namekey in thecassandra.yamlconfiguration file must be set to the existing cluster’scluster_name. This is required for the new datacenter nodes to join the existing cluster. If this is not set, the new datacenter nodes will not join the existing cluster. -
listen_address: Typically, you can leave this empty (not set). If not set, DSE asks the system for the local address, which is associated with its host name. In some cases, DSE doesn’t produce the correct address, which requires specifying thelisten_address. -
endpoint_snitch: Provide the snitch configuration.Don’t use the default
DseSimpleSnitch. TheDseSimpleSnitchis used only for single-datacenter deployments (or single-zone deployments in public clouds). It doesn’t recognize datacenter or rack information.For the GossipingPropertyFileSnitch, Amazon EC2 single-region snitch, Amazon EC2 multi-region snitch, and Google Cloud Platform snitch, configure the datacenter and rack information in the cassandra-rackdc.properties file. For the PropertyFileSnitch, configure the datacenter and rack information in the cassandra-topology.properties file.
-
If using a
cassandra.yamlordse.yamlfile from a previous version, check the upgrade guide for your previous and current version for removed settings.
-
-
Configure node architecture. All nodes in the datacenter must use the same type.
-
Virtual node (vnode) allocation algorithm settings
-
Single-token architecture settings
-
Set
num_tokensto 8 (recommended). -
Set
allocate_tokens_for_local_replication_factorto the target replication factor for keyspaces in the new datacenter. If the keyspace replication factor varies, alternate the settings to use all the replication factors. -
Comment out the
initial_tokenproperty.
See Virtual node (vnode) configuration for more details.
-
Generate the initial token for each node, and then set that value in the
initial_tokenproperty. See Adding or replacing single-token nodes for more information. -
Comment out
num_tokensandallocate_tokens_for_local_replication_factor.
-
-
-
Depending on your snitch type, edit the appropriate configuration file to assign datacenter and rack names to the IP addresses of each node, and assign a default datacenter name and rack name for unknown nodes.
# Transactional Node IP=Datacenter:Rack 110.82.155.0=DC_Transactional:RAC1 110.82.155.1=DC_Transactional:RAC1 110.54.125.1=DC_Transactional:RAC2 110.54.125.2=DC_Analytics:RAC1 110.54.155.2=DC_Analytics:RAC2 110.82.155.3=DC_Analytics:RAC1 110.54.125.3=DC_Search:RAC1 110.82.155.4=DC_Search:RAC2 # default for unknown nodes default=DC1:RAC1For the
PropertyFileSnitch, these are set in thecassandra-topology.properties. For theGossipingPropertyFileSnitch, these are set in thecassandra-rackdc.properties.-
The
GossipingPropertyFileSnitchalways loadscassandra-topology.propertieswhen the file is present. Remove the file from each node on any new datacenter and from any datacenter migrated from thePropertyFileSnitch. -
After making any changes in the configuration files, you must restart the node for the changes to take effect.
-
-
Make the following changes in the existing datacenters:
-
On nodes in the existing datacenters, update the
-seedsproperty incassandra.yamlto include the seed nodes in the new datacenter. -
Add the new datacenter definition to the
cassandra.yamlproperties file for the type of snitch used in the cluster. If changing snitches, see Switching snitches.
-
-
After you have installed and configured DataStax Enterprise (DSE) on all nodes, start the nodes sequentially, beginning with the seed nodes.
After starting each node, allow a delay of at least the duration of
ring_delay_msbefore starting the next node to prevent cluster imbalance.Before starting a node, ensure that the previous node is up and running by verifying that it
nodetool statusreturnsUN(UpandNormal). Failing to do so can result in cluster imbalance that cannot be fixed later.Cluster imbalance can be visualised by running
nodetool status KEYSPACE_NAMEand checking theOwnershipcolumn in the response. A properly configured cluster reports ownership values similar to each other, within 1 percent, for keyspaces where the replication factor per DC is equal toallocate_tokens_for_local_replication_factor.-
Package installations: Start DataStax Enterprise as a service
-
Tarball installations: Start DataStax Enterprise as a standalone process
-
-
Continue starting DSE, rack by rack, until all the nodes are up.
-
After all nodes are running in the cluster and the client applications are datacenter-aware, use
cqlshto alter the keyspaces and set the desired replication factor in the new datacenter:ALTER KEYSPACE keyspace_name WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'ExistingDC1' : 3, 'NewDC2' : 2}; -
Run
nodetool rebuildon each node in the new datacenter, specifying the datacenter to rebuild from. This step replicates the data to the new datacenter in the cluster.nodetool rebuild -- <datacenter_name>Make sure the datacenter name is spelled correctly and the datacenter exists in the cluster. Nodes appear to rebuild successfully even if the datacenter doesn’t exist, but they might not contain all expected data.
Requests to the new datacenter with
LOCAL_ONEorONEconsistency levels can fail if the existing datacenters are not completely in-sync.You can run
nodetool rebuildon one or more nodes at the same time. Run the command on one node at a time to reduce the impact on the existing cluster. Run the command on multiple nodes simultaneously if the cluster can handle the extra I/O and network pressure. -
Check that the new cluster is up and running:
dsetool statusIf DSE has problems starting, visit DataStax Support for troubleshooting articles on starting DSE.
-
To add the third datacenter (DC3) to the cluster, repeat the steps starting from installing DSE through checking that the cluster is running.
Results
The datacenters in the cluster are now replicating with each other.
DC: Cassandra Workload: Cassandra Graph: no
==============================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 110.82.155.0 21.33 KB 256 33.3% a9fa31c7-f3c0-... RAC1
UN 110.82.155.1 21.33 KB 256 33.3% f5bb416c-db51-... RAC1
UN 110.54.125.1 21.33 KB 256 16.7% b836748f-c94f-... RAC2
DC: Analytics
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.54.125.2 28.44 KB 13.0.% e2451cdf-f070- ... -922337.... RAC1
UN 110.82.155.2 44.47 KB 16.7% f9fa427c-a2c5- ... 30745512... RAC2
UN 110.82.155.3 54.33 KB 23.6% b9fc31c7-3bc0- ..- 45674488... RAC1
DC: Solr
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID Tokens Rack
UN 110.54.125.3 15.44 KB 50.2.% e2451cdf-f070- ... 9243578.... RAC1
UN 110.82.155.4 18.78 KB 49.8.% e2451cdf-f070- ... 10000 RAC2