Roll back an upgrade
This page describes the procedure for rolling back an in-process upgrade to Apache Cassandra.
If you encounter problems by the time you reach Phase 5: Decide to continue or abandon the upgrade, such as application errors, nodes failing to start, or unexplainable behavior that prevent you from completing the upgrade, you have the option to roll back the cluster to the previous version of Cassandra.
Step 1: Shut down Cassandra
Stop the Cassandra service on the upgraded node.
-
Drain the node.
-
Command
-
Result
nodetool drain
The
nodetool drain
command doesn’t return any output.You can monitor drain progress by checking the Cassandra system.log file for messages similar to the following:
INFO [RMI TCP Connection(4)-127.0.0.1] 2023-06-01 03:59:37,442 StorageService.java:1660 - DRAINING: starting drain process INFO [RMI TCP Connection(4)-127.0.0.1] 2023-06-01 03:59:37,443 HintsService.java:210 - Paused hints dispatch INFO [RMI TCP Connection(4)-127.0.0.1] 2023-06-01 03:59:37,449 Server.java:179 - Stop listening for CQL clients INFO [RMI TCP Connection(4)-127.0.0.1] 2023-06-01 03:59:37,449 Gossiper.java:1720 - Announcing shutdown INFO [RMI TCP Connection(4)-127.0.0.1] 2023-06-01 03:59:37,465 StorageService.java:2585 - Node /10.166.73.33 state jump to shutdown INFO [RMI TCP Connection(4)-127.0.0.1] 2023-06-01 03:59:39,469 MessagingService.java:985 - Waiting for messaging service to quiesce INFO [ACCEPT-/10.166.72.33] 2023-06-01 03:59:39,470 MessagingService.java:1346 - MessagingService has terminated the accept() thread INFO [RMI TCP Connection(4)-127.0.0.1] 2023-06-01 03:59:39,794 HintsService.java:210 - Paused hints dispatch INFO [RMI TCP Connection(4)-127.0.0.1] 2023-06-01 03:59:39,806 StorageService.java:1660 - DRAINED
You can also confirm the status of the drain by running the
nodetool netstats
command and checking forMode: DRAINED
in the output.-
Command
-
Result
nodetool netstats
The drain was successful if you see
Mode: DRAINED
in the output.Mode: DRAINED Not sending any streams. Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool Name Active Pending Completed Dropped Large messages n/a 2 0 0 Small messages n/a 2 5 0 Gossip messages n/a 2 122 0
When you run the
nodetool drain
command, Cassandra stops listening for connections from clients and other nodes (no more data is written to the node) and all memtables are flushed to SSTables on disk. This ensures that all the data on the node is safely stored on disk before beginning the upgrade.After running
nodetool drain
, the node will not be able to service client reads or writes until Cassandra is restarted. -
-
Stop Cassandra.
-
Package installations
-
Tarball installations
To stop the Cassandra service for packaged installations:
sudo service cassandra stop
To stop the Cassandra process for tarball installations:
sudo kill $(ps auwx | grep cassandra | grep -v "grep" | tr -s ' ' | cut -d' ' -f2)
-
-
Ensure that no Cassandra process is running on the server.
-
Command
-
Result
ps auwx | grep CassandraDaemon
root 27921 0.0 0.0 3304 656 pts/0 S+ 07:38 0:00 grep --color=auto CassandraDaemon
Note that Cassandra might take some time to shut down, especially if it’s currently handling requests. If the service continues to run, then kill the process using the following command:
sudo kill -9 $(ps auwx | grep CassandraDaemon | grep -v "grep" | tr -s ' ' | cut -d' ' -f2)
-
Step 2: Downgrade to the previously installed version of Cassandra
Downgrade the Cassandra version using the same methodology and tooling you used for the upgrade.
If you’re using a configuration management system, you should now "converge" the node with the old version of the manifest, cookbook, etc.
If no configuration management is used, replace the binary or install the new Cassandra version. Commands for the different linux distributions are below.
If no configuration management is used, replace the binary or install the old Cassandra version. Commands for different Linux distributions are described below.
Ensure that the packaging system and configurations have been rolled back to the previous version of Cassandra. This varies on different linux distributions and containerization platforms.
-
Debian/Ubuntu (APT)
-
CentOS/RHEL (YUM)
-
Tarball
-
Docker
-
Ensure that the previous version of Cassandra is the version used in the cassandra.sources.list file.
For example, if the previous version is 3.11.15, then the corresponding distribution name is
311x
(with an "x" as the suffix). To update the repository for version 3.11.15 (311x
):-
Command
-
Result
echo "deb https://debian.cassandra.apache.org 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
deb https://debian.cassandra.apache.org 311x main
-
-
Add the Apache Cassandra repository keys to the list of trusted keys on the server:
-
cURL
-
Wget
-
Result
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
wget https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 278k 100 278k 0 0 168k 0 0:00:01 0:00:01 --:--:-- 168k OK
-
-
Update the package index from sources:
sudo apt-get update
-
Remove the upgraded version of Cassandra:
sudo apt remove cassandra
-
Install the previous version of Cassandra:
sudo apt-get install cassandra=3.11.15
-
Update the Apache Cassandra repository information in the /etc/yum.repos.d/cassandra.repo file (as the
root
user) to ensure that the previous version of Cassandra is the version used by the packaging system.For example, if the previous version is 3.11.15, then the corresponding distribution name is
311x
(with an "x" as the suffix). To update the repository for version 3.11.15 (311x
), make sure the content of cassandra.repo matches the following:[cassandra] name=Apache Cassandra baseurl=https://redhat.cassandra.apache.org/311x/ gpgcheck=1 repo_gpgcheck=1 gpgkey=https://downloads.apache.org/cassandra/KEYS
-
Update the package index.
-
Command
-
Result
sudo yum update
Apache Cassandra 817 B/s | 833 B 00:01 Apache Cassandra 199 kB/s | 275 kB 00:01 Importing GPG key 0xF2833C93: Userid : "Eric Evans <eevans@sym-link.com>" Fingerprint: CEC8 6BB4 A0BA 9D0F 9039 7CAE F835 8FA2 F283 3C93 From : https://downloads.apache.org/cassandra/KEYS Is this ok [y/N]:
Type Y and press Return to import each of the GPG keys.
-
-
Remove the upgraded version of Cassandra:
sudo yum remove cassandra
-
Install the previous version of Cassandra:
sudo yum install cassandra-3.11.15-1
Type Y and press Return to begin the installation.
After the packages have downloaded, you’ll be asked to import GPG keys. Type Y and press Return to import each of the GPG keys, after which the installation will continue until completion.
-
In the directory where Cassandra is installed, run the following command to delete the contents of the directory except for the data directory.
rm -vfr !(data)
-
Downloading the binary tarball for the previous version of Cassandra. For example, to download Cassandra 3.11.15:
-
cURL
-
Wget
curl -OL https://archive.apache.org/dist/cassandra/3.11.15/apache-cassandra-3.11.15-bin.tar.gz
wget https://archive.apache.org/dist/cassandra/3.11.15/apache-cassandra-3.11.15-bin.tar.gz
To download a different version of Cassandra, visit the Apache Archives.
-
-
Unpack the tarball:
tar xzf apache-cassandra-3.11.15-bin.tar.gz
The files will be extracted to the apache-cassandra-3.11.15 directory. This is the tarball installation location.
-
Move the apache-cassandra-3.11.15 directory to the same location as your current installation of Cassandra. For example:
mv apache-cassandra-3.11.15 /usr/local/cassandra-3
-
Update your
PATH
and environment variables to point to the new installation. For example:export PATH="/usr/bin:/usr/local/cassandra-3/bin:/usr/local/cassandra-3/tools/bin:$PATH"
-
Delete the tarball.
rm apache-cassandra-3.11.15-bin.tar.gz
On a Docker cluster, the rollback scenario differs. Starting up the previous containers is the simplest approach. Starting up new containers on the previous Docker image is also possible, but concerns and challenges listed in Docker considerations above must be adhered to.
Step 3: Delete Cassandra operation data
Delete the operational files created by Cassandra when it was running. Specifically, the hints, commitlog, and saved_caches directories. The deletion commands in this step assume all three directories are located in the default directory path used by the package installation: /var/lib/cassandra.
The paths for the hints, commitlog, and saved_cache directories can be respectively defined in the cassandra.yaml file.
Setting in cassandra.yaml | Corresponding operational directory |
---|---|
|
hints |
|
commitlog |
|
saved_caches |
Run the following commands to delete the operational files created by Cassandra. If any of the settings in the above table are defined in the cassandra.yaml file, then use that path value for the corresponding operational directory.
sudo rm /var/lib/cassandra/hints/*
sudo rm /var/lib/cassandra/commitlog/*
sudo rm /var/lib/cassandra/saved_caches/*
Step 4: Restore the snapshot files
Find and delete any new SSTables created by the node. If upgrading to Cassandra 4.x, these SSTables will be in the Cassandra 4.0+ format.
CASSANDRA_DATA=/full/path/to/cassandra/data/
sudo find ${CASSANDRA_DATA} \
-type f \
-iname "nb-*" \
-exec bash -c "rm -f {}" \;
Restore the snapshot files for each table stored on the node.
CASSANDRA_DATA=/full/path/to/cassandra/data/
sudo find ${CASSANDRA_DATA} \
-type d \
-iname "pre-40-upgrade*" \
-exec bash -c "cp -p {}/* {}/../../" \;
Step 5: Restore the configuration files
Decompress the configuration file backups you made previously and restore the configurations for the previous version of Cassandra.
cd /etc/cassandra
sudo tar xzf ~/cassandra-config-backup.tgz
Step 6: Start Cassandra
Start the Cassandra service on the downgraded node using the following command (or an equivalent on your system):
-
Package installations
-
Tarball installations
To start the Cassandra service for packaged installations:
sudo service cassandra start
To start the Cassandra process for tarball installations:
<install-location>/bin/cassandra
After a few seconds, the internal Cassandra processes will come online.
Step 7: Confirm Cassandra status
After starting the downgraded Cassandra service, check the status of the node while also continually monitoring your applications for signs of degraded performance and functionality.
-
Monitor
/var/log/cassandra/system.log
to confirm there are noERROR
orWARN
statements.-
Command
-
Result
sudo tail -n 50 -f /var/log/cassandra/system.log
Messages similar to the following will appear in system.log when the Cassandra process starts:
INFO [main] 2023-06-01 04:44:45,897 SystemKeyspace.java:1729 - Detected version upgrade from 3.11.15 to 4.1.2, snapshotting system keyspaces ... INFO [main] 2023-06-01 05:06:00,630 StorageService.java:864 - Cassandra version: 4.1.2
-
-
Confirm that the node is reporting status
UN
(Up and Normal).-
Command
-
Result
nodetool status
Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 10.166.73.33 360.5 KiB 256 28.2% 7370b2ef-c9c3-4c83-bb44-0879cef0f6c8 rack1 UN 10.166.76.162 301.28 KiB 256 34.3% e8f291db-b3ec-47fa-8bee-27e033c9655f rack1 UN 10.166.77.78 132 KiB 256 37.4% baae5ba3-a842-4b23-a70b-069364dac689 rack1
It’s important to check that the other nodes see the node as
UP
, and also that the node sees all other nodes asUP
. Therefore, you should runnodetool status
both on the node and on at least one other node. The output should be the same on both nodes. -
-
Confirm that the node is using the intended version of Cassandra.
-
Command
-
Result
nodetool version
ReleaseVersion: 4.1.2
-
-
Confirm that the node is processing read and write traffic as well as requests from clients. You can confirm this by watching for
Completed
tasks in the thread pool stats:-
Command
-
Result
watch -d nodetool tpstats
Every 2.0s: nodetool tpstats Pool Name Active Pending Completed Blocked All time blocked RequestResponseStage 0 0 3 0 0 ReadStage 0 0 3 0 0 CompactionExecutor 0 0 6422 0 0 MemtableReclaimMemory 0 0 44 0 0 PendingRangeCalculator 0 0 8 0 0 GossipStage 0 0 36429 0 0 SecondaryIndexManagement 0 0 1 0 0 HintsDispatcher 0 0 0 0 0 MigrationStage 0 0 34 0 0 MemtablePostFlush 0 0 60 0 0 PerDiskMemtableFlushWriter_0 0 0 33 0 0 ValidationExecutor 0 0 0 0 0 Sampler 0 0 0 0 0 ViewBuildExecutor 0 0 0 0 0 MemtableFlushWriter 0 0 44 0 0 CacheCleanupExecutor 0 0 0 0 0 Native-Transport-Requests 0 0 0 0 0 Latencies waiting in queue (micros) per dropped message types Message type Dropped 50% 95% 99% Max READ_RSP 0 0.0 0.0 0.0 0.0 RANGE_REQ 0 0.0 0.0 0.0 0.0 PING_REQ 0 0.0 0.0 0.0 0.0 PAXOS2_COMMIT_REMOTE_RSP 0 0.0 0.0 0.0 0.0 PAXOS2_COMMIT_AND_PREPARE_RSP 0 0.0 0.0 0.0 0.0
Specifically, you should confirm that counters for the following tasks are increasing:
-
ReadStage
: local read tasks -
MutationStage
: local writes tasks -
RequestResponseStage
: tasks that process the responses from replicas when acting as a coordinator
If all of these counters are increasing, it indicates that the node is successfully processing traffic and communicating properly with the rest of the cluster. There should be no pending, blocked, or dropped messages.
-
Step 8: Run a repair
This node will experience a data loss. Any new data written to it while the new version of Cassandra was running will be lost. In addition, this data will be absent from the pre-upgrade snapshot as it was taken prior to installing the new version of Cassandra. Data written to the cluster during this time will likely be in an inconsistent state.
To resolve this issue, run a repair on the node using Reaper. A new repair will need to be configured and launched for each non-system keyspace on the node.
Step 9: Roll back remaining upgraded nodes
Repeat the previous steps on each node that had been upgraded to the new version of Cassandra until all nodes in the cluster have been downgraded to the previously installed version of Cassandra.
Step 10: Clean up after rollback
Complete the steps in Phase 7: Clean up after upgrade or rollback.