Multiple data center deployment per workload type
Steps for configuring nodes in a deployment scenario in a mixed workload cluster that has more than one data center for each type of node.
In this scenario, a mixed workload cluster has more than one data center for each type of node. For example, if the cluster has 4 analytics nodes, 4 Cassandra nodes, and 2 DSE Search nodes, the cluster could have 5 data centers: 2 data centers for analytics nodes, 2 data centers for Cassandra nodes, and 1 data center for the DSE Search node. A single data-center cluster has only one data center for each type of node.
In Cassandra, a data center can be a physical data center or virtual data center. Different workloads must always use separate data centers, either physical or virtual.
- Isolating replicas from external infrastructure failures, such as networking between data centers and power outages.
- Distributed data replication across multiple, geographically dispersed nodes.
- Between different physical racks in a physical data center.
- Between public cloud providers and on-premise managed data centers.
- To prevent the slow down of a real-time analytics cluster by a development cluster running analytics jobs on live data.
- To ensure your reads from a specific data center is local to the requests, especially when using a consistency level greater than ONE, use virtual data centers in the physical data center. This strategy ensures lower latency because it avoids reads from one node in New York and another read from a node in Los Angeles.
- Choosing keyspace replication options
- Configuring replication
- Single-token architecture deployment
- Data replication (Applies only to the single-token-per-node architecture.)
Prerequisites
To configure a multi-node cluster with multiple data centers:
- A good understanding of how Cassandra works. Be sure to read at least Understanding the architecture, Data Replication, and Cassandra's rack feature.
- Ensure DataStax Enterprise is installed on each node.
- Choose a name for the cluster.
- For a mixed-workload cluster, determine the purpose of each node.
- Determine the snitch and replication strategy. The GossipingPropertyFileSnitch and NetworkTopologyStrategy are recommended for production environments.
- Get the IP address of each node.
- Determine which nodes are seed nodes. Do not make all nodes seed nodes. Seed nodes are not required for DSE Search datacenters. Read Internode communications (gossip).
- Develop a naming convention for each data center and rack, for example: DC1, DC2 or 100, 200 and RAC1, RAC2 or R101, R102.
- Use the yaml_diff
tool to review and make appropriate changes to the
cassandra.yaml configuration file.The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml Tarball installations install_location/resources/cassandra/conf/cassandra.yaml - Set virtual nodes correctly for the type of data center. DataStax does not recommend using virtual nodes on data centers running BYOH or DSE Hadoop. See Virtual nodes.
Procedure
This configuration example describes installing a 6 node cluster spanning 2 data centers. The default consistency level is QUORUM.
Results
Datacenter: DC1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.168.66.41 45.96 KB 256 27.4% c885aac7-f2c0-... RAC1
UN 10.168.247.41 66.34 KB 256 36.6% fa31416c-db22-... RAC1
UN 10.169.61.170 55.72 KB 256 33.0% f488367f-c14f-... RAC1
Datacenter: DC2
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.176.43.66 45.96 KB 256 27.4% f9fa31c7-f3c0-... RAC1
UN 10.176.170.59 66.34 KB 256 36.6% a5bb526c-db51-... RAC1
UN 10.169.30.138 55.72 KB 256 33.0% b836478f-c49f-... RAC1