Initializing single-token architecture datacenters

Follow these steps only when not using virtual nodes (vnodes).

In most circumstances, each workload type, such as search, analytics, and transactional, should be organized into separate virtual datacenters. Workload segregation avoids contention for resources. However, workloads can be combined in SearchAnalytics nodes when there is not a large demand for analytics, or when analytics queries must use a DSE Search index. Generally, combining transactional (OLTP) and analytics (OLAP) workloads results in decreased performance.

When creating a keyspace using CQL, DataStax Enterprise creates a virtual datacenter for a cluster, even a one-node cluster, automatically. You assign nodes that run the same type of workload to the same datacenter. The separate, virtual datacenters for different types of nodes segregate workloads that run DSE Search from those nodes that run other workload types.

Prerequisites

Complete the tasks outlined in Initializing a DataStax Enterprise cluster to prepare the environment.

Procedure

These steps provide information about setting up a cluster having one or more datacenters.

  1. Suppose you install DataStax Enterprise on these nodes:

    • node0 10.168.66.41 (seed1)

    • node1 10.176.43.66

    • node2 10.168.247.41

    • node3 10.176.170.59 (seed2)

    • node4 10.169.61.170

    • node5 10.169.30.138

  2. Calculate the token assignments as described in Calculating tokens for single-token architecture nodes.

    The following tables list tokens for a 6 node cluster with a single datacenter or two datacenters.

    Single Datacenter
    Node Token

    node0

    0

    node1

    21267647932558653966460912964485513216

    node2

    42535295865117307932921825928971026432

    node3

    63802943797675961899382738893456539648

    node4

    85070591730234615865843651857942052864

    node5

    106338239662793269832304564822427566080

    Multiple Datacenters
    Node Token Offset Datacenter

    node0

    0

    NA

    DC1

    node1

    56713727820156410577229101238628035242

    NA

    DC1

    node2

    113427455640312821154458202477256070485

    NA

    DC1

    node3

    100

    100

    DC2

    node4

    56713727820156410577229101238628035342

    100

    DC2

    node5

    113427455640312821154458202477256070585

    100

    DC2

  3. If the nodes are behind a firewall, open the required ports for internal/external communication.

  4. If DataStax Enterprise is running, stop the node and clear the data:

    • Package installations: To stop DSE:

      sudo service dse stop

      To remove data from the default directories:

      sudo rm -rf /var/lib/cassandra/*
    • Tarball installations:

      From the installation location, stop the database:

      bin/dse cassandra-stop

      Remove all data:

      cd </var/lib/cassandra/data> &&
      sudo rm -rf data/* commitlog/* saved_caches/* hints/*
  5. Configure properties in cassandra.yaml on each new node, following the configuration of the other nodes in the cluster.

Use the yaml_diff tool to review and make appropriate changes to the cassandra.yaml and dse.yaml configuration files.

  1. Configure node properties.

    • initial_token: token_value_from_calculation

    • num_tokens: 1

    • -seeds: <internal_IP_address> of each seed node

      Include at least one seed node from each datacenter. DataStax recommends more than one seed node per datacenter. Do not make all nodes seed nodes.

    • listen_address: <empty>

      If not set, DSE asks the system for the local address, which is associated with its host name. In some cases, DSE does not produce the correct address, which requires specifying the listen_address.

    • auto_bootstrap: <false>

      Add the bootstrap setting only when initializing a new cluster with no data.

    • endpoint_snitch: <snitch>

      Do not use the DseSimpleSnitch. The DseSimpleSnitch (default) is used only for single-datacenter deployments (or single-zone deployments in public clouds), and does not recognize datacenter or rack information.

      Snitch Configuration file

      GossipingPropertyFileSnitch

      cassandra-rackdc.properties file

      Configuring the Amazon EC2 single-region snitch

      Configuring Amazon EC2 multi-region snitch

      Configuring the Google Cloud Platform snitch

      PropertyFileSnitch

      cassandra-topology.properties file

    • If using a cassandra.yaml or dse.yaml file from a previous version, check the Upgrade Guide for removed settings.

      1. Set the properties in the dse.yaml file as required by your use case.

      2. In the cassandra-rackdc.properties (GossipingPropertyFileSnitch) or cassandra-topology.properties (PropertyFileSnitch) file, assign datacenter and rack names to the IP addresses of each node, and assign a default datacenter name and rack name for unknown nodes.

        Migration information: The GossipingPropertyFileSnitch always loads cassandra-topology.properties when the file is present. Remove the file from each node on any new datacenter, or any datacenter migrated from the PropertyFileSnitch.

        # Transactional Node IP=Datacenter:Rack
        110.82.155.0=DC_Transactional:RAC1
        110.82.155.1=DC_Transactional:RAC1
        110.54.125.1=DC_Transactional:RAC2
        110.54.125.2=DC_Analytics:RAC1
        110.54.155.2=DC_Analytics:RAC2
        110.82.155.3=DC_Analytics:RAC1
        110.54.125.3=DC_Search:RAC1
        110.82.155.4=DC_Search:RAC2
        
        # default for unknown nodes
        default=DC1:RAC1

        After making any changes in the configuration files, you must the restart the node for the changes to take effect.

      3. After you have installed and configured DataStax Enterprise on all nodes, start the nodes sequentially, beginning with the seed nodes. After starting each node, allow a delay of at least the value specified in ring_delay_ms before starting the next node, to prevent a cluster imbalance.

        Before starting a node, ensure that the previous node is up and running by verifying that it has a nodetool status of UN. Failing to do so will result in cluster imbalance that cannot be fixed later. Cluster imbalance can be visualised by running nodetool status $keyspace and by looking at the ownership column. A properly setup cluster will report ownership values similar to each other (±1%). That is, for keyspaces where the RF per DC is equal to allocate_tokens_for_local_replication_factor.

      4. Check that the new cluster is up and running:

        dsetool status

        If DSE has problems starting, look for starting DSE troubleshooting and other articles in the Support Knowledge Center.

Results

Datacenter: Cassandra
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address         Load        Tokens    Owns    Host ID             Rack
UN 110.82.155.0    21.33 KB    256       33.3%   a9fa31c7-f3c0-...   RAC1
UN 110.82.155.1    21.33 KB    256       33.3%   f5bb416c-db51-...   RAC1
UN 110.82.155.2    21.33 KB    256       16.7%   b836748f-c94f-...   RAC1
Calculating tokens for single-token architecture nodes

When not using vnodes, use these steps to calculate tokens to evenly distribute data across a cluster.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com