Initialize a single datacenter per workload type

In this scenario, a mixed workload cluster has only one datacenter for each type of workload. For example, an eight-node cluster with the following nodes would use three datacenters, one for each workload type:

  • DC1 = 3 DSE Analytics nodes

  • DC2 = 3 Transactional nodes

  • DC3 = 2 DSE Search nodes

In contrast, a multiple datacenter cluster has more than one datacenter for each type of workload.

The eight-node cluster spans two racks across three datacenters. Applications in each datacenter will use a default consistency level of LOCAL_QUORUM. One node per rack will serve as a seed node.

Node IP address Type Seed Rack

node0

110.82.155.0

Transactional

RAC1

node1

110.82.155.1

Transactional

RAC1

node2

110.54.125.1

Transactional

RAC2

node3

110.54.125.2

Analytics

RAC1

node4

110.54.155.2

Analytics

RAC2

node5

110.82.155.3

Analytics

RAC1

node6

110.54.125.3

Search

RAC1

node7

110.82.155.4

Search

RAC2

Prerequisites

Procedure

  1. Configure client applications so they don’t prematurely connect to the new datacenter, and ensure that the consistency level for reads or writes doesn’t query the new datacenter.

    If client applications, including DSE Search and DSE Analytics, aren’t properly configured, they might connect to the new datacenter before it is online. Incorrect configuration results in connection exceptions, timeouts, and/or inconsistent data.

    1. Configure client applications to use the DCAwareRoundRobinPolicy.

    2. Direct clients to an existing datacenter. Otherwise, clients might try to access the new datacenter, which might not have any data.

    3. If using the QUORUM consistency level, change to LOCAL_QUORUM.

    4. If using the ONE consistency level, set to LOCAL_ONE.

    For more information, see the language-specific documentation for your DataStax-compatible driver.

  2. If your existing datacenters use the SimpleStrategy replication strategy, change it to the NetworkTopologyStrategy replication strategy:

    1. Use ALTER KEYSPACE to change the keyspace replication strategy to NetworkTopologyStrategy for the following keyspaces:

    2. Use DESCRIBE SCHEMA to check the replication strategy of keyspaces in the cluster. Ensure that any existing keyspaces use the NetworkTopologyStrategy replication strategy.

      DESCRIBE SCHEMA;
      Result
      CREATE KEYSPACE dse_perf WITH replication =
      {'class': 'NetworkTopologyStrategy, 'DC1': '3'}  AND durable_writes = true;
      ...
      
      CREATE KEYSPACE dse_leases WITH replication =
      {'class': 'NetworkTopologyStrategy, 'DC1': '3'}  AND durable_writes = true;
      ...
      
      CREATE KEYSPACE dsefs WITH replication =
      {'class': 'NetworkTopologyStrategy, 'DC1': '3'}  AND durable_writes = true;
      ...
      
      CREATE KEYSPACE dse_security WITH replication =
      {'class': 'NetworkTopologyStrategy, 'DC1': '3'}  AND durable_writes = true;
  3. Install DSE on each node in the new datacenter.

    Don’t start the service or restart the node.

    Use the same version of DataStax Enterprise (DSE) on all nodes in the cluster.

  4. Configure properties in cassandra.yaml on each new node, following the configuration of the other nodes in the cluster.

    If you used Lifecycle Manager to provision the nodes, configuration is performed automatically.

    For manual configuration, use the yaml_diff tool to review and make appropriate changes to the cassandra.yaml and dse.yaml configuration files.

    1. Configure node properties:

      • -seeds: The internal IP address of each seed node.

        Include at least one seed node from each datacenter. DataStax recommends more than one seed node per datacenter, in more than one rack. 3 is the most common number of seed nodes per datacenter. Do not make all nodes seed nodes.

      • auto_bootstrap: This setting has been removed from the default configuration, but, if present, should be set to true.

      • cluster_name: On the new datacenter nodes, the cluster_name key in the cassandra.yaml configuration file must be set to the existing cluster’s cluster_name. This is required for the new datacenter nodes to join the existing cluster. If this is not set, the new datacenter nodes will not join the existing cluster.

      • listen_address: Typically, you can leave this empty (not set). If not set, DSE asks the system for the local address, which is associated with its host name. In some cases, DSE doesn’t produce the correct address, which requires specifying the listen_address.

      • endpoint_snitch: Provide the snitch configuration.

        Don’t use the default DseSimpleSnitch. The DseSimpleSnitch is used only for single-datacenter deployments (or single-zone deployments in public clouds). It doesn’t recognize datacenter or rack information.

        For the GossipingPropertyFileSnitch, Amazon EC2 single-region snitch, Amazon EC2 multi-region snitch, and Google Cloud Platform snitch, configure the datacenter and rack information in the cassandra-rackdc.properties file. For the PropertyFileSnitch, configure the datacenter and rack information in the cassandra-topology.properties file.

      • If using a cassandra.yaml or dse.yaml file from a previous version, check the upgrade guide for your previous and current version for removed settings.

    2. Configure node architecture. All nodes in the datacenter must use the same type.

      • Virtual node (vnode) allocation algorithm settings

      • Single-token architecture settings

      See Virtual node (vnode) configuration for more details.

  5. Depending on your snitch type, edit the appropriate configuration file to assign datacenter and rack names to the IP addresses of each node, and assign a default datacenter name and rack name for unknown nodes.

    # Transactional Node IP=Datacenter:Rack
    110.82.155.0=DC_Transactional:RAC1
    110.82.155.1=DC_Transactional:RAC1
    110.54.125.1=DC_Transactional:RAC2
    110.54.125.2=DC_Analytics:RAC1
    110.54.155.2=DC_Analytics:RAC2
    110.82.155.3=DC_Analytics:RAC1
    110.54.125.3=DC_Search:RAC1
    110.82.155.4=DC_Search:RAC2
    
    # default for unknown nodes
    default=DC1:RAC1

    For the PropertyFileSnitch, these are set in the cassandra-topology.properties. For the GossipingPropertyFileSnitch, these are set in the cassandra-rackdc.properties.

    • The GossipingPropertyFileSnitch always loads cassandra-topology.properties when the file is present. Remove the file from each node on any new datacenter and from any datacenter migrated from the PropertyFileSnitch.

    • After making any changes in the configuration files, you must restart the node for the changes to take effect.

  6. Make the following changes in the existing datacenters:

    1. On nodes in the existing datacenters, update the -seeds property in cassandra.yaml to include the seed nodes in the new datacenter.

    2. Add the new datacenter definition to the cassandra.yaml properties file for the type of snitch used in the cluster. If changing snitches, see Switching snitches.

  7. After you have installed and configured DataStax Enterprise (DSE) on all nodes, start the nodes sequentially, beginning with the seed nodes.

    After starting each node, allow a delay of at least the duration of ring_delay_ms before starting the next node to prevent cluster imbalance.

    Before starting a node, ensure that the previous node is up and running by verifying that it nodetool status returns UN (Up and Normal). Failing to do so can result in cluster imbalance that cannot be fixed later.

    Cluster imbalance can be visualised by running nodetool status KEYSPACE_NAME and checking the Ownership column in the response. A properly configured cluster reports ownership values similar to each other, within 1 percent, for keyspaces where the replication factor per DC is equal to allocate_tokens_for_local_replication_factor.

  8. Continue starting DSE, rack by rack, until all the nodes are up.

  9. After all nodes are running in the cluster and the client applications are datacenter-aware, use cqlsh to alter the keyspaces and set the desired replication factor in the new datacenter:

    ALTER KEYSPACE keyspace_name WITH REPLICATION =
    {'class' : 'NetworkTopologyStrategy', 'ExistingDC1' : 3, 'NewDC2' : 2};
  10. Run nodetool rebuild on each node in the new datacenter, specifying the datacenter to rebuild from. This step replicates the data to the new datacenter in the cluster.

    nodetool rebuild -- <datacenter_name>

    Make sure the datacenter name is spelled correctly and the datacenter exists in the cluster. Nodes appear to rebuild successfully even if the datacenter doesn’t exist, but they might not contain all expected data.

    Requests to the new datacenter with LOCAL_ONE or ONE consistency levels can fail if the existing datacenters are not completely in-sync.

    You can run nodetool rebuild on one or more nodes at the same time. Run the command on one node at a time to reduce the impact on the existing cluster. Run the command on multiple nodes simultaneously if the cluster can handle the extra I/O and network pressure.

  11. Check that the new cluster is up and running:

    dsetool status

    If DSE has problems starting, visit DataStax Support for troubleshooting articles on starting DSE.

  12. To add the third datacenter (DC3) to the cluster, repeat the steps starting from installing DSE through checking that the cluster is running.

Results

The datacenters in the cluster are now replicating with each other.

DC: Cassandra   Workload: Cassandra  Graph: no
==============================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address         Load        Tokens    Owns    Host ID             Rack
UN 110.82.155.0    21.33 KB    256       33.3%   a9fa31c7-f3c0-...   RAC1
UN 110.82.155.1    21.33 KB    256       33.3%   f5bb416c-db51-...   RAC1
UN 110.54.125.1    21.33 KB    256       16.7%   b836748f-c94f-...   RAC2

DC: Analytics
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address         Load        Owns      Host ID               Tokens         Rack
UN 110.54.125.2    28.44 KB    13.0.%    e2451cdf-f070- ...    -922337....    RAC1
UN 110.82.155.2    44.47 KB    16.7%     f9fa427c-a2c5- ...    30745512...    RAC2
UN 110.82.155.3    54.33 KB    23.6%     b9fc31c7-3bc0- ..-    45674488...    RAC1

DC: Solr
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address         Load        Owns      Host ID               Tokens         Rack
UN 110.54.125.3    15.44 KB    50.2.%    e2451cdf-f070- ...    9243578....    RAC1
UN 110.82.155.4    18.78 KB    49.8.%    e2451cdf-f070- ...    10000          RAC2

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2025 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM