Initializing a single datacenter per workload type

Steps for configuring nodes in a mixed-workload cluster that has only one datacenter for each type of workload.

In most circumstances, each workload type, such as search, analytics, and transactional, should be organized into separate virtual datacenters. Workload segregation avoids contention for resources. However, workloads can be combined in SearchAnalytics nodes when there is not a large demand for analytics, as combining transactional (OLTP) and analytics (OLAP) workloads result in decreases performance. You can enable DSE Graph only on the nodes you want to query.

When you create a keyspace using CQL, Cassandra creates a virtual datacenter for a cluster, even a one-node cluster, automatically. You assign nodes that run the same type of workload to the same datacenter. The separate, virtual datacenters for different types of nodes segregate workloads that run DSE Search from those nodes that run other workload types.

In this scenario, a mixed workload cluster has only one datacenter for each type of workload. For example, if the cluster has 3 analytics nodes, 3 Cassandra nodes, and 2 DSE Search nodes, the cluster would have 3 datacenters, one for each type of workload. In contrast, a multiple data-center cluster has more than one datacenter for each type of workload.

Prerequisites

Procedure

This configuration example describes installing an 8 node cluster spanning 2 racks in a single datacenter with a default consistency level of QUORUM.

  1. Suppose the nodes have the following IPs and one node per rack will serve as a seed:
    • node0 110.82.155.0 (Cassandra seed)
    • node1 110.82.155.1 (Cassandra)
    • node2 110.82.155.2 (Cassandra)
    • node3 110.82.155.3 (Analytics seed)
    • node4 110.82.155.4 (Analytics)
    • node5 110.82.155.5 (Analytics)
    • node6 110.82.155.6 (Search - seed nodes not required for DSE Search.)
    • node7 110.82.155.7 (Search)
  2. If the nodes are behind a firewall, open the required ports for internal/external communication.
  3. If DataStax Enterprise is running, stop the nodes and clear the data:
    • Installer-Services and Package installations:
      $ sudo service dse stop
      $ sudo rm -rf /var/lib/cassandra/*  # Clears the data from the  default directories
    • Installer-No Services and Tarball installations:

      From the install directory:

      $ sudo bin/dse cassandra-stop
      $ sudo rm -rf /var/lib/cassandra/*  # Clears the data from the  default directories 
  4. Set the properties in the cassandra.yaml configuration file for each node.

    If the nodes in the cluster are identical in terms of disk layout, shared libraries, and so on, you can use the same copy of the cassandra.yaml file on all of the nodes.

    Properties to set:

    Note: Use the yaml_diff tool to review and make appropriate changes to the cassandra.yaml and dse.yaml configuration files.
    • num_tokens: See vnode recommendations.
    • -seeds: internal_IP_address of each seed node
    • listen_address: empty

      If not set, Cassandra asks the system for the local address, the one associated with its host name. In some cases Cassandra doesn't produce the correct address and you must specify the listen_address.

    • endpoint_snitch: snitch

      See endpoint_snitch and About Snitches. If you are changing snitches, see Switching snitches.

    • auto_bootstrap: false

      Add the bootstrap setting only when initializing a fresh cluster with no data.

    • If you are using a cassandra.yaml or dse.yaml file from a previous version, be sure to check the Upgrade guide for removed settings:

    You must include at least one seed node from each datacenter. DataStax recommends that you have more than one seed node per datacenter. Do not make all nodes seed nodes.

    cluster_name: 'MyDemoCluster'
    num_tokens: 256
    seed_provider:
      - class_name: org.apache.cassandra.locator.SimpleSeedProvider
        parameters:
             - seeds: "110.82.155.0,110.82.155.3"
    listen_address:
    endpoint_snitch: GossipingPropertyFileSnitch
  5. Set the properties in the dse.yaml file as required by your use case.
  6. In the cassandra-rackdc.properties (GossipingPropertyFileSnitch) or cassandra-topology.properties (PropertyFileSnitch) file, use your naming convention to assign datacenter and rack names to the IP addresses of each node, and assign a default datacenter name and rack name for unknown nodes.
    Note: The GossipingPropertyFileSnitch always loads cassandra-topology.properties when that file is present. Remove the file from each node on any new cluster, or any cluster migrated from the PropertyFileSnitch.
    The default location of the cassandra-topology.properties file depends on the type of installation:
    Installer-Services and Package installations /etc/dse/cassandra/cassandra-topology.properties
    Installer-No Services and Tarball installations install_location/resources/cassandra/conf/cassandra-topology.properties
    The default location of the cassandra-rackdc.properties file depends on the type of installation:
    Installer-Services and Package installations /etc/dse/cassandra/cassandra-rackdc.properties
    Installer-No Services and Tarball installations install_location/resources/cassandra/conf/cassandra-rackdc.properties
    The location of the dse.yaml file depends on the type of installation:
    Installer-Services /etc/dse/dse.yaml
    Package installations /etc/dse/dse.yaml
    Installer-No Services install_location/resources/dse/conf/dse.yaml
    Tarball installations install_location/resources/dse/conf/dse.yaml
    The location of the cassandra.yaml file depends on the type of installation:
    Installer-Services /etc/dse/cassandra/cassandra.yaml
    Package installations /etc/dse/cassandra/cassandra.yaml
    Installer-No Services install_location/resources/cassandra/conf/cassandra.yaml
    Tarball installations install_location/resources/cassandra/conf/cassandra.yaml
    Type or datacenter or cluster Number of vnodes (tokens)
    Cassandra-only 128
    Spark 128
    DSE Search-only 16 or 32
    DSE Graph 128
    DSE Graph when using with DSE Search 16 or 32
    DSE Hadoop Not recommended.

    If you want to use vnodes, be aware that vnodes can cause a sharp increase in the Hadoop task scheduling latency due to the number of Hadoop splits, which cannot be lower than the number of vnodes in the analytics datacenter. Using vnodes, instead of N splits for tiny data, such as 256 * N splits (N = number of physical nodes in the cluster), can raise job latency from tens of seconds to tens of minutes. However, job latency can be relatively insignificant when analyzing huge quantities of data that inherently have multiple splits and take hours to run anyway.

    BYOH
    # Cassandra Node IP=Datacenter:Rack
    110.82.155.0=DC_Cassandra:RAC1
    110.82.155.1=DC_Cassandra:RAC1
    110.82.155.2=DC_Cassandra:RAC1
    110.82.155.3=DC_Analytics:RAC1
    110.82.155.4=DC_Analytics:RAC1
    110.82.155.5=DC_Analytics:RAC1
    110.82.155.6=DC_Solr:RAC1
    110.82.155.7=DC_Solr:RAC1
    
    # default for unknown nodes
    default=DC1:RAC1
    Note: After making any changes in the configuration files, you must the restart the node for the changes to take effect.
  7. After you have installed and configured DataStax Enterprise on all nodes, start the seed nodes one at a time, and then start the rest of the nodes:
  8. Check that your cluster is up and running:
    • Installer-Services and Package installations: $ nodetool status
    • Installer-No Services and Tarball installations: $ install_location/bin/nodetool status

Results

Datacenter: Cassandra
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address         Load        Tokens    Owns    Host ID             Rack
UN 110.82.155.0    21.33 KB    256       33.3%   a9fa31c7-f3c0-...   RAC1
UN 110.82.155.1    21.33 KB    256       33.3%   f5bb416c-db51-...   RAC1
UN 110.82.155.2    21.33 KB    256       16.7%   b836748f-c94f-...   RAC1
Datacenter: Analytics
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address         Load        Owns      Host ID               Tokens         Rack
UN 110.82.155.3    28.44 KB    13.0.%    e2451cdf-f070- ...    -922337....    RAC1
UN 110.82.155.4    44.47 KB    16.7%     f9fa427c-a2c5- ...    30745512...    RAC1 
UN 110.82.155.5    54.33 KB    23.6%     b9fc31c7-3bc0- ..-    45674488...    RAC1
Datacenter: Solr
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address         Load        Owns      Host ID               Tokens         Rack
UN 110.82.155.6    15.44 KB    50.2.%    e2451cdf-f070- ...    9243578....    RAC1
UN 110.82.155.7    18.78 KB    49.8.%    e2451cdf-f070- ...    10000          RAC1