Restoring a snapshot into a new cluster

Suppose you want to copy a snapshot of SSTable data files from a three node DataStax Enterprise cluster with vnodes enabled (128 tokens) and recover it on another newly created three node cluster (128 tokens). The token ranges do not match because the token ranges cannot be exactly the same in the new cluster. You need to specify the tokens for the new cluster that were used in the old cluster.

This procedure assumes you are familiar with restoring a snapshot and configuring and initializing a cluster.

Where is the cassandra.yaml file?

The location of the cassandra.yaml file depends on the type of installation:

Installation Type Location

Package installations + Installer-Services installations

/etc/dse/cassandra/cassandra.yaml

Tarball installations + Installer-No Services installations

<installation_location>/resources/cassandra/conf/cassandra.yaml

Procedure

To recover the snapshot on the new cluster:

  1. From the old cluster, retrieve the list of tokens associated with each node’s IP:

    $ nodetool ring | grep -w ip_address_of_node | awk '{print $NF ","}' | xargs
  2. In the cassandra.yaml file for each node in the new cluster, add the list of tokens you obtained in the previous step to the initial_token parameter using the same num_tokens setting as in the old cluster.

    If nodes are assigned to racks, make sure the token allocation and rack assignments in the new cluster are identical to those of the old.

  3. Make any other necessary changes in the new cluster’s cassandra.yaml and property files so that the new nodes match the old cluster settings. Make sure the seed nodes are set for the new cluster.

  4. Clear the system table data from each new node:

    $ sudo rm -rf /var/lib/cassandra/data/system/*

    This allows the new nodes to use the initial tokens defined in the cassandra.yaml when they restart.

  5. Start each node using the specified list of token ranges in new cluster’s cassandra.yaml:

    initial_token: -9211270970129494930, -9138351317258731895, -8980763462514965928, ...
  6. Create schema in the new cluster. All the schemas from the old cluster must be reproduced in the new cluster.

  7. Stop the node. Using nodetool refresh is unsafe because files within the data directory of a running node can be silently overwritten by identically named just-flushed SSTables from memtable flushes or compaction. Copying files into the data directory and restarting the node does not work for the same reason.

  8. Restore the SSTable files snapshotted from the old cluster onto the new cluster using the same directories, while noting that the UUID component of target directory names has changed. Without restoration, the new cluster does not have data to read upon restart.

  9. Restart the node.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com