Initializing a multiple node cluster (multiple data centers)
A deployment scenario for a Cassandra cluster with multiple data centers.
This topic contains information for deploying a Cassandra cluster with multiple data centers.
Data replicates across the data centers automatically and transparently; no ETL work is necessary to move data between different systems or servers. You can configure the number of copies of the data in each data center and Cassandra handles the rest, replicating the data for you.
In Cassandra, the term data center is a grouping of nodes. Data center is synonymous with replication group, that is, a grouping of nodes configured together for replication purposes.¶
- Install Cassandra on each node.
- Choose a name for the cluster.
- Get the IP address of each node.
- Determine which nodes will be seed nodes. (Cassandra nodes use the seed node list for finding each other and learning the topology of the ring.)
- Determine the snitch.
- If using multiple data centers, determine a naming convention for each data center and rack, for example: DC1, DC2 or 100, 200 and RAC1, RAC2 or R101, R102.
- Other possible configuration settings are described in The cassandra.yaml configuration file.
Suppose you install Cassandra on these nodes:
node0 10.168.66.41 (seed1) node1 10.176.43.66 node2 10.168.247.41 node3 10.176.170.59 (seed2) node4 10.169.61.170 node5 10.169.30.138
- If you have a firewall running on the nodes in your cluster, you must open certain ports to allow communication between the nodes. See Configuring firewall port access.
If Cassandra is running:
$ sudo service cassandra stop
Clear the data:
$ sudo rm -rf /var/lib/cassandra/*
$ ps auwx | grep cassandra $ sudo kill <pid>
Clear the data:
$ cd <install_location> $ sudo rm -rf /var/lib/cassandra/*
- Stop Cassandra:
Modify the following property settings in the
cassandra.yaml file for each node:
- num_tokens: <recommended value: 256>
- -seeds: <internal IP address of each seed node>
- listen_address: <localhost IP address>
- endpoint_snitch: <name of snitch> (See endpoint_snitch.)
- auto_bootstrap: false (Add this setting only when initializing a fresh cluster with no data.)
cluster_name: 'MyDemoCluster' num_tokens: 256 seed_provider: - class_name: org.apache.cassandra.locator.SimpleSeedProvider parameters: - seeds: "10.168.66.41,10.176.170.59" listen_address: 10.168.66.41 endpoint_snitch: PropertyFileSnitchNote: Include at least one node from each data center.
node1 to node5
The properties for these nodes are the same as node0 except for the listen_address.
In the cassandra-topology.properties file, assign the data
center and rack names you determined in the Prerequisites to the IP addresses of
each node. For example:
# Cassandra Node IP=Data Center:Rack 10.168.66.41=DC1:RAC1 10.176.43.66=DC2:RAC1 10.168.247.41=DC1:RAC1 10.176.170.59=DC2:RAC1 10.169.61.170=DC1:RAC1 10.169.30.138=DC2:RAC1
Also, in the cassandra-topologies.properties file, assign
a default data center name and rack name for unknown nodes.
# default for unknown nodes default=DC1:RAC1
After you have installed and configured Cassandra on all nodes, start the seed
nodes one at a time, and then start the rest of the nodes.
If the node has restarted because of automatic restart, you must stop the node and clear the data directories, as described above.
For packaged installs, run the following command:
$ sudo service cassandra start
For binary installs, run the following commands:
$ cd <install_location> $ bin/cassandra
To check that the ring is up and running, run the nodetool