Initializing multiple datacenters per workload type

Steps for configuring nodes in a mixed-workload cluster that has more than one datacenter for each type of node.

In most circumstances, each workload type, such as search, analytics, and transactional, should be organized into separate virtual datacenters. Workload segregation avoids contention for resources. However, workloads can be combined in SearchAnalytics nodes when there is not a large demand for analytics, as combining transactional (OLTP) and analytics (OLAP) workloads result in decreases performance. You can enable DSE Graph only on the nodes you want to query.

When you create a keyspace using CQL, Cassandra creates a virtual datacenter for a cluster, even a one-node cluster, automatically. You assign nodes that run the same type of workload to the same datacenter. The separate, virtual datacenters for different types of nodes segregate workloads that run DSE Search from those nodes that run other workload types.

In this scenario, a mixed workload cluster has only one datacenter for each type of workload. For example, if the cluster has 3 analytics nodes, 3 Cassandra nodes, and 2 DSE Search nodes, the cluster would have 3 datacenters, one for each type of workload. In contrast, a multiple data-center cluster has more than one datacenter for each type of workload.

Prerequisites

Procedure

This configuration example describes installing a 6 node cluster spanning 2 data centers. The default consistency level is QUORUM.

  1. Suppose you install DataStax Enterprise on these nodes:
    • node0 10.168.66.41 (seed1)
    • node1 10.176.43.66
    • node2 10.168.247.41
    • node3 10.176.170.59 (seed2)
    • node4 10.169.61.170
    • node5 10.169.30.138
  2. If the nodes are behind a firewall, open the required ports for internal/external communication.
  3. If DataStax Enterprise is running, stop the nodes and clear the data:
    • Installer-Services and Package installations:
      $ sudo service dse stop
      $ sudo rm -rf /var/lib/cassandra/*  # Clears the data from the  default directories
    • Installer-No Services and Tarball installations:

      From the install directory:

      $ sudo bin/dse cassandra-stop
      $ sudo rm -rf /var/lib/cassandra/*  # Clears the data from the  default directories 
  4. Set the properties in the cassandra.yaml file for each node:

    If the nodes in the cluster are identical in terms of disk layout, shared libraries, and so on, you can use the same copy of the cassandra.yaml file on all of the nodes.

    The location of the cassandra.yaml file depends on the type of installation:
    Installer-Services /etc/dse/cassandra/cassandra.yaml
    Package installations /etc/dse/cassandra/cassandra.yaml
    Installer-No Services install_location/resources/cassandra/conf/cassandra.yaml
    Tarball installations install_location/resources/cassandra/conf/cassandra.yaml

    Properties to set:

    Note: Use the yaml_diff tool to review and make appropriate changes to the cassandra.yaml and dse.yaml configuration files.
    • num_tokens: See vnode recommendations.
    • -seeds: internal_IP_address of each seed node
    • listen_address: empty

      If not set, Cassandra asks the system for the local address, the one associated with its host name. In some cases Cassandra doesn't produce the correct address and you must specify the listen_address.

    • endpoint_snitch: snitch

      See endpoint_snitch and About Snitches. If you are changing snitches, see Switching snitches.

    • auto_bootstrap: false

      Add the bootstrap setting only when initializing a fresh cluster with no data.

    • If you are using a cassandra.yaml or dse.yaml file from a previous version, be sure to check the Upgrade guide for removed settings:

    You must include at least one seed node from each datacenter. DataStax recommends that you have more than one seed node per datacenter. Do not make all nodes seed nodes.

    cluster_name: 'MyDemoCluster'
    num_tokens: 256
    seed_provider:
      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
        parameters:
             - seeds: "10.168.66.41,10.176.170.59"
    listen_address:
    endpoint_snitch: GossipingPropertyFileSnitch
  5. Set the properties in the dse.yaml file as required by your use case.
  6. In the cassandra-rackdc.properties (GossipingPropertyFileSnitch) or cassandra-topology.properties (PropertyFileSnitch) file, use your naming convention to assign datacenter and rack names to the IP addresses of each node, and assign a default datacenter name and rack name for unknown nodes.
    Note: The GossipingPropertyFileSnitch always loads cassandra-topology.properties when that file is present. Remove the file from each node on any new cluster, or any cluster migrated from the PropertyFileSnitch.
    The default location of the cassandra-topology.properties file depends on the type of installation:
    Installer-Services and Package installations /etc/dse/cassandra/cassandra-topology.properties
    Installer-No Services and Tarball installations install_location/resources/cassandra/conf/cassandra-topology.properties
    The default location of the cassandra-rackdc.properties file depends on the type of installation:
    Installer-Services and Package installations /etc/dse/cassandra/cassandra-rackdc.properties
    Installer-No Services and Tarball installations install_location/resources/cassandra/conf/cassandra-rackdc.properties

    Example:

    # Cassandra Node IP=Datacenter:Rack
    10.168.66.41=DC1:RAC1
    10.176.43.66=DC2:RAC1
    10.168.247.41=DC1:RAC1
    10.176.170.59=DC2:RAC1
    10.169.61.170=DC1:RAC1
    10.169.30.138=DC2:RAC1
    
    # default for unknown nodes
    default=DC1:RAC1
  7. After you have installed and configured DataStax Enterprise on all nodes, start the seed nodes one at a time, and then start the rest of the nodes:
  8. Check that your cluster is up and running:
    • Installer-Services and Package installations: $ nodetool status
    • Installer-No Services and Tarball installations: $ install_location/bin/nodetool status

Results

Datacenter: DC1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address          Load        Tokens    Owns    Host ID             Rack
UN 10.168.66.41     45.96 KB    256       27.4%   c885aac7-f2c0-...   RAC1
UN 10.168.247.41    66.34 KB    256       36.6%   fa31416c-db22-...   RAC1
UN 10.169.61.170    55.72 KB    256       33.0%   f488367f-c14f-...   RAC1
Datacenter: DC2
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address          Load        Tokens    Owns    Host ID             Rack
UN 10.176.43.66     45.96 KB    256       27.4%   f9fa31c7-f3c0-...   RAC1
UN 10.176.170.59    66.34 KB    256       36.6%   a5bb526c-db51-...   RAC1
UN 10.169.30.138    55.72 KB    256       33.0%   b836478f-c49f-...   RAC1

What's next