Phase 6: Upgrade SSTables to new file format

If the SSTable file format has changed in the new version of Cassandra, then you’ll need to upgrade your SSTables to the new format to complete the version upgrade. Upgrading is not complete until the SSTables are upgraded.

The SSTable format is only likely to change when upgrading between major versions of Cassandra. For example, if you’re upgrading from Cassandra 3.x to Cassandra 4.x, you’ll need to upgrade your SSTables to the new format. If the SSTable format hasn’t changed in the new version of Cassandra, you can skip to the next phase: Phase 7: Clean up after upgrade or rollback.

Important considerations when upgrading SSTables
  • The SSTable upgrade process must be performed on each node, one node at a time, across the entire cluster.

  • You can run the upgradesstables command before all of the nodes have been upgraded to the new version of Cassandra as long as you run the command on only one node at a time or, when using racks, one rack at a time. Running upgradesstables on too many nodes at once degrades performance.

  • Upgrading the SSTable format rewrites all the SSTables, which requires significant disk space to accomplish. If the snapshot you created is left in place, the total amount of disk space used by Cassandra may double. If the amount of free disk space on a node is greater than 50%, you can leave the snapshot in place while upgrading SSTables. If there is less than 50% free disk space, remove the snapshot as described in Phase 7: Clean up after upgrade or rollback before upgrading the SSTables.

Step 1: Upgrade SSTables

The following steps describe how to use the nodetool upgradesstables command to upgrade the SSTables on an online cluster. Offline SSTable upgrades utilizing the sstableupgrade tool are outside the scope of this guide.

  1. Run the following command to upgrade the SSTables on a node:

    nodetool upgradesstables

    If the SSTables are already on the current version, the command returns immediately and no action is taken.

    You can use the --jobs option to set the number of SSTables that upgrade simultaneously. For example:

    nodetool upgradesstables --jobs 4

    The default setting is 2, which minimizes impact on the cluster. Setting the number of jobs to 0 will use all available compaction threads. It’s important to note, however, that the number of jobs cannot exceed the concurrent_compactors configured in cassandra.yaml.

  2. Monitor the SSTable upgrade process.

    1. The upgradesstables command operates much like a single-table compaction that rewrites the same SSTable using the new format. Because SSTables are stored in sorted order, CPU usage and disk I/O should be relatively low. However, you should monitor Cassandra and application latency metrics to ensure that concurrent executions of the upgradesstables command don’t overwhelm the cluster.

    2. The upgradesstables command relies on the compaction thread pool for orchestration. You can monitor progress with the following command:

      • Command

      • Result

      watch -d "nodetool compactionstats -H"
      Every 2.0s: nodetool compactionstats -H
      
      pending tasks: 0

    It’s normal behavior for the pending task count to stay above 0. However, the number of pending tasks should drastically reduce after the upgradesstables tasks have completed.

  3. Repeat the above steps on each node in the cluster, one at a time, until all SSTables have been upgraded.

Step 2: Confirm SSTables have been upgraded

SSTables that have been upgraded to the Cassandra 4.x format have filenames that start with the nb- format specifier. If the upgradesstables operation completed successfully, then all SSTable filenames on the node (excluding the snapshot) will be prefixed with nb-, indicating that they have been upgraded to the new format.

The following command looks for all files in the Cassandra data directory that don’t start with nb- (excluding files in snapshot directories):

sudo find /var/lib/cassandra/data -type f | grep -v "snapshots" | rev | cut -d'/' -f1 | rev | grep -v "^nb\-"

The above command assumes the default location of the Cassandra data directory (/var/lib/cassandra/data). If you’ve configured Cassandra to use a different data directory, you’ll need to replace this with the full path to the Cassandra data directory.

If no SSTable filenames are returned in the command output, then all SSTables have been upgraded to the new format.

End of phase

At the end of this phase:

  • All SSTables have been upgraded to the latest format.

    • In the case of upgrades to Cassandra 4.x, all SSTable filenames (excluding the snapshot) are prefixed with the nb- format specifier.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com