Phase 6: Upgrade SSTables to new file format
If the SSTable file format has changed in the new version of Cassandra, then you’ll need to upgrade your SSTables to the new format to complete the version upgrade. Upgrading is not complete until the SSTables are upgraded.
The SSTable format is only likely to change when upgrading between major versions of Cassandra. For example, if you’re upgrading from Cassandra 3.x to Cassandra 4.x, you’ll need to upgrade your SSTables to the new format. If the SSTable format hasn’t changed in the new version of Cassandra, you can skip to the next phase: Phase 7: Clean up after upgrade or rollback.
The SSTable upgrade process must be performed on each node, one node at a time, across the entire cluster.
You can run the
upgradesstablescommand before all of the nodes have been upgraded to the new version of Cassandra as long as you run the command on only one node at a time or, when using racks, one rack at a time. Running
upgradesstableson too many nodes at once degrades performance.
Upgrading the SSTable format rewrites all the SSTables, which requires significant disk space to accomplish. If the snapshot you created is left in place, the total amount of disk space used by Cassandra may double. If the amount of free disk space on a node is greater than 50%, you can leave the snapshot in place while upgrading SSTables. If there is less than 50% free disk space, remove the snapshot as described in Phase 7: Clean up after upgrade or rollback before upgrading the SSTables.
The following steps describe how to use the
Run the following command to upgrade the SSTables on a node:
If the SSTables are already on the current version, the command returns immediately and no action is taken.
You can use the
--jobsoption to set the number of SSTables that upgrade simultaneously. For example:
nodetool upgradesstables --jobs 4
The default setting is
2, which minimizes impact on the cluster. Setting the number of jobs to
0will use all available compaction threads. It’s important to note, however, that the number of jobs cannot exceed the
concurrent_compactorsconfigured in cassandra.yaml.
Monitor the SSTable upgrade process.
upgradesstablescommand operates much like a single-table compaction that rewrites the same SSTable using the new format. Because SSTables are stored in sorted order, CPU usage and disk I/O should be relatively low. However, you should monitor Cassandra and application latency metrics to ensure that concurrent executions of the
upgradesstablescommand don’t overwhelm the cluster.
upgradesstablescommand relies on the compaction thread pool for orchestration. You can monitor progress with the following command:
watch -d "nodetool compactionstats -H"
Every 2.0s: nodetool compactionstats -H pending tasks: 0
It’s normal behavior for the pending task count to stay above 0. However, the number of pending tasks should drastically reduce after the
upgradesstablestasks have completed.
Repeat the above steps on each node in the cluster, one at a time, until all SSTables have been upgraded.
SSTables that have been upgraded to the Cassandra 4.x format have filenames that start with the nb- format specifier.
upgradesstables operation completed successfully, then all SSTable filenames on the node (excluding the snapshot) will be prefixed with nb-, indicating that they have been upgraded to the new format.
The following command looks for all files in the Cassandra data directory that don’t start with nb- (excluding files in snapshot directories):
sudo find /var/lib/cassandra/data -type f | grep -v "snapshots" | rev | cut -d'/' -f1 | rev | grep -v "^nb\-"
The above command assumes the default location of the Cassandra data directory (/var/lib/cassandra/data). If you’ve configured Cassandra to use a different data directory, you’ll need to replace this with the full path to the Cassandra data directory.
If no SSTable filenames are returned in the command output, then all SSTables have been upgraded to the new format.
At the end of this phase:
All SSTables have been upgraded to the latest format.
In the case of upgrades to Cassandra 4.x, all SSTable filenames (excluding the snapshot) are prefixed with the