Multiple datacenter deployment per workload type
Steps for configuring nodes in a deployment scenario in a mixed workload cluster that has more than one datacenter for each type of node.
In this scenario, a mixed workload cluster has more than one datacenter for each type of node. For example, if the cluster has 4 analytics nodes, 4 Cassandra nodes, and 2 DSE Search nodes, the cluster could have 5 datacenters: 2 datacenters for analytics nodes, 2 datacenters for Cassandra nodes, and 1 datacenter for the DSE Search node. A single datacenter cluster has only one datacenter for each type of node.
In Cassandra, a datacenter can be a physical datacenter or virtual datacenter. Different workloads must always use separate datacenters, either physical or virtual.
- Isolating replicas from external infrastructure failures, such as networking between datacenters and power outages.
- Distributed data replication across multiple, geographically dispersed nodes.
- Between different physical racks in a physical datacenter.
- Between public cloud providers and on-premise managed datacenters.
- To prevent the slow down of a real-time analytics cluster by a development cluster running analytics jobs on live data.
- To ensure your reads from a specific datacenter is local to the requests, especially when using a consistency level greater than ONE, use virtual datacenters in the physical datacenter. This strategy ensures lower latency because it avoids reads from one node in New York and another read from a node in Los Angeles.
- Choosing keyspace replication options
- Configuring replication
- Single-token architecture deployment
- Data replication (Applies only to the single-token-per-node architecture.)
Prerequisites
To configure a multi-node cluster with multiple datacenters:
- A good understanding of how Cassandra works. Be sure to read at least Understanding the architecture, Data Replication, and Cassandra's rack feature.
- Ensure DataStax Enterprise is installed on each node.
- Choose a name for the cluster.
- For a mixed-workload cluster, determine the purpose of each node.
- Determine the snitch and replication strategy. The GossipingPropertyFileSnitch and NetworkTopologyStrategy are recommended for production environments.
- Get the IP address of each node.
- Determine which nodes are seed nodes. Do not make all nodes seed nodes. Seed nodes are not required for DSE Search datacenters. Read Internode communications (gossip).
- Develop a naming convention for each datacenter and rack, for example: DC1, DC2 or 100, 200 and RAC1, RAC2 or R101, R102.
- Use the yaml_diff
tool to review and make appropriate changes to the
cassandra.yaml configuration file.The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml Tarball installations install_location/resources/cassandra/conf/cassandra.yaml - Set virtual nodes correctly for the type of datacenter. DataStax does not recommend using virtual nodes on datacenters running BYOH or DSE Hadoop. See Virtual nodes.
Procedure
This configuration example describes installing a 6 node cluster spanning 2 data centers. The default consistency level is QUORUM.
Results
Datacenter: DC1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.168.66.41 45.96 KB 256 27.4% c885aac7-f2c0-... RAC1
UN 10.168.247.41 66.34 KB 256 36.6% fa31416c-db22-... RAC1
UN 10.169.61.170 55.72 KB 256 33.0% f488367f-c14f-... RAC1
Datacenter: DC2
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.176.43.66 45.96 KB 256 27.4% f9fa31c7-f3c0-... RAC1
UN 10.176.170.59 66.34 KB 256 36.6% a5bb526c-db51-... RAC1
UN 10.169.30.138 55.72 KB 256 33.0% b836478f-c49f-... RAC1