Add a datacenter to a cluster using a designated datacenter as a data source

This procedure shows you how to add a new datacenter to an existing cluster using a designated datacenter as a data source. In this example, you add datacenter DC4 to a cluster with existing datacenters DC1, DC2, and DC3.

Prerequisites

  • An existing cluster with properly configured datacenters

  • The same version of HCD available for installation

  • Network connectivity between all datacenters

When naming your datacenter:

  • Use a maximum of 48 characters.

  • Use only alphanumeric characters.

  • Don’t use special characters or spaces.

Prepare existing datacenters

Make sure all keyspaces use the NetworkTopologyStrategy replication strategy:

  1. Change replication strategy for application keyspaces:

    ALTER KEYSPACE keyspace_name
    WITH REPLICATION = {
      'class' : 'NetworkTopologyStrategy',
      'DC1' : 3
    };
  2. Update the following system keyspaces only:

    • system_auth: Stores authentication and authorization.

    • system_distributed: Stores repair history.

    • system_traces: Stores trace information when CQL tracing is enabled.

      Do not modify the replication strategy for any other system keyspaces.

  3. Verify the replication strategy with DESCRIBE SCHEMA:

    DESCRIBE SCHEMA ;
    CREATE KEYSPACE hcd_perf WITH replication =
    {'class': 'NetworkTopologyStrategy, 'DC1': '3'}  AND durable_writes = true;
    ...
    
    CREATE KEYSPACE hcd_leases WITH replication =
    {'class': 'NetworkTopologyStrategy, 'DC1': '3'}  AND durable_writes = true;
    ...
    
    CREATE KEYSPACE HCDFS WITH replication =
    {'class': 'NetworkTopologyStrategy, 'DC1': '3'}  AND durable_writes = true;
    ...
    
    CREATE KEYSPACE hcd_security WITH replication =
    {'class': 'NetworkTopologyStrategy, 'DC1': '3'}  AND durable_writes = true;

Install and configure new nodes

  1. Install HCD on each node in the new datacenter.

    • Use the same version of HCD on all nodes in the cluster.

    • Don’t start the service or restart the node yet.

  2. Configure cassandra.yaml on each new node:

    Essential configuration properties
    Property Value Description

    -seeds

    <internal_IP_address> of each seed node

    Include at least one seed node from each datacenter; 3 per datacenter is recommended

    auto_bootstrap

    true (if present)

    This setting might be removed in newer versions

    listen_address

    <empty> or specific address

    If not set, HCD uses the local address

    endpoint_snitch

    <snitch>

    See the snitch configuration options below

    Snitch configuration files
    Snitch Configuration file

    GossipingPropertyFileSnitch

    cassandra-rackdc.properties

    Ec2Snitch

    cassandra-rackdc.properties

    Ec2MultiRegionSnitch

    cassandra-rackdc.properties

    GoogleCloudSnitch

    cassandra-rackdc.properties

    PropertyFileSnitch

    cassandra-topology.properties

    • If you are using a cassandra.yaml file from a previous version, check the Upgrade Guide for removed settings.

  3. Choose and configure node architecture. All nodes in the datacenter must use the same type:

    • Virtual node (vnode) architecture

    • Single-token architecture

    For more information, see Virtual node (vnode) configuration.

    For more information, see Add or replace single-token nodes.

  4. Configure snitch properties in the appropriate file:

    # Transactional Node IP=Datacenter:Rack
    110.82.155.0=DC_Transactional:RAC1
    110.82.155.1=DC_Transactional:RAC1
    110.54.125.1=DC_Transactional:RAC2
    110.54.125.2=DC_Analytics:RAC1
    110.54.155.2=DC_Analytics:RAC2
    110.82.155.3=DC_Analytics:RAC1
    110.54.125.3=DC_Search:RAC1
    110.82.155.4=DC_Search:RAC2
    
    # default for unknown nodes
    default=dc_unknown:rac_unknown

    GossipingPropertyFileSnitch always loads cassandra-topology.properties when the file is present.

    Only keep this file if you’re using PropertyFileSnitch. Otherwise, delete it from all nodes in all datacenters.

  5. Restart the node to apply the changes.

Update existing datacenters

On nodes in the existing datacenters, do the following: . Update the -seeds property in cassandra.yaml to include the seed nodes in the new datacenter. . Add the new datacenter definition to the snitch configuration files.

+ If you need to change snitches, see Switching snitches.

Start and verify the new datacenter

  1. Start nodes sequentially, beginning with the seed nodes.

    Starting nodes too quickly can cause permanent cluster imbalance.

    After starting a node, wait at least as long as ring_delay_ms and verify that the node is up (UN status) before starting the next node.

  2. Verify the new datacenter is operational:

    nodetool status
    Result
    Datacenter: DC1
    ===============
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    -- Address       Load       Owns Host ID           Token                Rack
    UN 10.200.175.11 474.23 KiB ?    7297d21e-a04e-... -9223372036854775808 RAC1
    Datacenter: DC2
    ===============
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    -- Address        Load       Owns Host ID           Token                Rack
    UN 10.200.175.113 518.36 KiB ?    2ff7d46c-f084-... -9223372036854775798 RAC1
    Datacenter: DC3
    ===============
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    -- Address        Load       Owns Host ID           Token                Rack
    UN 10.200.175.111 961.56 KiB ?    ac43e602-ef09-... -9223372036854775788 RAC1
    Datacenter: DC4
    ===============
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    -- Address        Load       Owns Host ID           Token                Rack
    UN 10.200.175.114 361.56 KiB ?    ac43e602-ef09-... -9223372036854775688 RAC1

Update replication and rebuild data

  1. Update keyspace replication to include the new datacenter:

    ALTER KEYSPACE keyspace_name
    WITH REPLICATION = {
      'class' : 'NetworkTopologyStrategy',
      'ExistingDC1' : 3,
      'NewDC2' : 2
    };

    Replace ExistingDC1 and NewDC2 with your actual datacenter names.

  2. Rebuild data in the new datacenter:

    Basic rebuild command
    nodetool rebuild -dc SOURCE_DATACENTER_NAME

    Replace SOURCE_DATACENTER_NAME with the name of the datacenter you want to rebuild data from.

    Rebuild command options
    Option Command

    Standard rebuild

    nodetool rebuild -dc DC1

    Rack-specific rebuild

    nodetool rebuild -dc DC1:RAC1

    Background rebuild with logging

    nohup nodetool rebuild -dc DC1 > rebuild.log 2>&1 &

  3. Choose a rebuild strategy based on your priorities:

    It is safe to run rebuilds in parallel, but this can have performance impacts. The nodes in the source datacenter are streaming data, which can impact application performance. Run tests within your environment and adjust parallelism and streaming throttling to achieve the optimal balance of speed and performance.

    • Minimize source load

    • Maximize rebuild speed

    • Balance performance

    • Run on one node at a time (sequential rebuilds)

    • Reduces the load on the source datacenter

    • Takes longer to complete the rebuild process

    • Run on multiple nodes simultaneously (parallel rebuilds)

    • Requires sufficient cluster capacity to handle extra I/O and network traffic

    • Completes rebuild faster

    • Adjust stream throttling with nodetool setinterdcstreamthroughput

    • Distributes allocated bandwidth across operations

    • Balances speed and source datacenter performance

  4. For rack-specific rebuilds, run the appropriate command on each rack:

    • On RAC1 nodes in DC2 run: nodetool rebuild -dc DC1:RAC1

    • On RAC2 nodes in DC2 run: nodetool rebuild -dc DC1:RAC2

    • On RAC3 nodes in DC2 run: nodetool rebuild -dc DC1:RAC3

  5. Monitor rebuild progress:

    nodetool netstats

    The nodetool rebuild command issues a JMX call to the HCD node and waits for the rebuild to finish before returning to the command line. Once the JMX call is invoked, the rebuild process continues to run on the server even if the nodetool command stops.

    Additional notes:

    • The data load shown in nodetool status updates only after streaming completes.

    • The system logs streaming errors to system.log.

    • If a temporary failure occurs, running nodetool rebuild again skips already streamed ranges.

  6. Adjust stream throttling as needed:

  7. Verify rebuild completion:

    Search for finished rebuild in the system.log of each node in the new datacenter.

  8. If you modified the inter-datacenter streaming throughput, reset it to the original setting.

  9. Start the Mission Control Repair Service if necessary.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax, an IBM Company | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com