Upgrade Apache Cassandra® to DataStax Enterprise
The upgrade process from open-source Apache Cassandra® to DataStax Enterprise (DSE) requires that you upgrade and restart one node at a time while other nodes continue to operate online. With a few exceptions, the cluster continues to work as though it were on the earlier platform until all of the nodes in the cluster are upgraded.
|
DataStax strongly recommends using the Zero-Downtime Migration (ZDM) tools for the lowest risk and least possible downtime when migrating from Cassandra to DSE. This approach permits a wider range of upgrade paths without the need for interim upgrades, and it provides a seamless rollback strategy. If you intend to perform an in-place upgrade, carefully review the upgrade planning guide and all upgrade instructions before you begin the upgrade to reduce the chance of errors and data loss. |
For assistance with migrations from Cassandra to DSE, contact DataStax Support.
Upgrade paths
Upgrades are dependent upon your current Cassandra version and your target DSE version. The greater the gap between the current version and the target version, the more complex the upgrade. Upgrades from earlier versions can require one or more interim upgrades.
| Current Cassandra version | DSE upgrade path |
|---|---|
Cassandra versions after 3.11, including 4.x and 5.x |
In-place upgrades are riskier or more complex due to differences between open-source Cassandra and the version of Cassandra in DSE. Instead, DataStax recommends using the ZDM tools to migrate your data to a new, separate DSE cluster. This approach provides a seamless rollback strategy in case of data loss or corruption. For other options and more information, see Migrate to DataStax Enterprise. |
Cassandra 3.0 or 3.11 |
If you want to perform an in-place upgrade on your existing clusters, you must upgrade to DSE 5.1 first, and then you can upgrade to DSE 6.8 or 6.9. Use the instructions in this guide to upgrade to 5.1, and then upgrade from 5.1 to 6.8 or 5.1 to 6.9. Alternatively, you can use the ZDM tools to migrate your data to a new, separate DSE cluster without the need for an interim upgrade. |
Cassandra 2.1 to the end of the 2.x series |
For in-place upgrades, you cannot upgrade directly to DSE 5.1, 6.8, or 6.9 from Cassandra versions earlier than 3.0. Upgrade to Cassandra 3.0 (minimum), and then follow the upgrade path for Cassandra 3.0 or 3.11. Alternatively, if you are running Cassandra version 2.1.6 or later, you can use the ZDM tools to migrate your data to a new, separate DSE cluster without the need for an interim upgrade. |
Cassandra 2.0 and earlier |
You cannot upgrade directly to DSE 5.1, 6.8, or 6.9 from Cassandra version 2.0 or earlier. If you are on Cassandra 2.0 or earlier, you must upgrade to Cassandra 2.1, then upgrade to Cassandra 3.0 (minimum), and then you can follow the upgrade path for Cassandra 3.0 or 3.11. If you want to avoid multiple interim upgrades, upgrade to Cassandra 2.1.6, and then use the ZDM tools to migrate your data to a new, separate DSE cluster without the need for an additional interim upgrades. |
Back up your existing installation
|
DataStax recommends backing up your data prior to any version upgrade. |
A backup provides the ability to revert and restore all the data used in the previous version if necessary.
You can use the same process to back up Cassandra as you would for DSE, changing directory names and DSE-specific commands as needed. For instructions, see Backing up a tarball installation or Backing up a package installation.
Upgrade restrictions and limitations
Restrictions and limitations apply while a cluster is in a partially upgraded state. This means that some, but not all, nodes in the cluster have been upgraded. The cluster continues to work as though it were on the earlier platform until all of the nodes in the cluster are upgraded. For this reason, you must avoid certain operations until the upgrade is complete on all nodes.
Nodes on different versions might show a schema disagreement during an upgrade. This is normal.
General restrictions
-
Don’t enable new features.
-
Don’t run
nodetool repair. -
Disable all automated repair processes.
-
During the upgrade, don’t bootstrap new nodes or decommission existing nodes.
-
Don’t enable Change Data Capture (CDC) on a mixed-version cluster. Upgrade all nodes to DSE 5.1 or later before enabling CDC.
-
Don’t issue
TRUNCATEor DDL related queries during the upgrade process. -
Don’t alter schemas for any workloads. Propagation of schema changes between mixed-version nodes can have unexpected results. Take action to prevent schema changes from occurring during the upgrade process.
Upgrade time limit
Once you upgrade one node in a cluster, you must complete the cluster-wide upgrade before the expiration of gc_grace_seconds (default 10 days) to ensure any repairs complete successfully.
Use storage port 7000 for online upgrades
Online upgrades require the default storage port 7000.
A cluster that uses non-default storage_port values must use the ZDM tools to upgrade to DSE.
Verify your storage port configuration before you begin the upgrade process.
Restrictions for nodes using security
-
Don’t change security credentials or permissions until the upgrade is complete on all nodes.
-
If you aren’t already using Kerberos, don’t set up Kerberos authentication before upgrading. First upgrade the cluster, and then set up Kerberos.
|
If you plan to upgrade to DSE 6.9.7 or later, you will need to modify the upgrade process if your cluster uses legacy legacy internode encryption (deprecated in Cassandra 4.0), including transitional mode to permit an internode encryption-based cluster to interact with unencrypted nodes.
In Cassandra 4.0 and DSE 6.9.7 and later, To enable an legacy-encrypted cluster to continue to function during an upgrade to DSE 6.9.7 or later, do the following after upgrading your nodes to DSE 5.1:
|
Application code and driver compatibility
Check driver compatibility to ensure that your driver version supports both your Cassandra version and DSE 5.1 (minimum).
If your target DSE version is later than 5.1, select a version that supports Cassandra, DSE 5.1, and your target DSE version. If no such version exists, you will need to upgrade your driver version again after you upgrade your clusters to DSE 5.1.
If you need to upgrade your driver, be sure to check the driver documentation for any code changes that might be required between your original and new driver versions. Depending on the driver version, you might need to recompile your client application code.
During upgrades, you might experience driver-specific issues when clusters have mixed versions of drivers. If your cluster has mixed versions, the protocol version is negotiated with the first host to which the driver connects, although certain drivers automatically select a protocol version that works across nodes. To avoid driver version incompatibility during upgrades, use one of the following workarounds:
-
Set the protocol version explicitly in your application at startup. Switch the driver to the new protocol version only after fully upgrading all nodes in the cluster.
-
Ensure that the list of initial contact points contains only hosts with the oldest database version or protocol version. For example, the initial contact points contain only protocol version 2.
For details on protocol version negotiation, see the documentation for your driver.
Prepare to upgrade
Follow these steps to prepare each Cassandra node for the upgrade:
-
Familiarize yourself with the changes and features in your target version of DSE:
-
Review the general upgrade advice and Cassandra features in
NEWS.txt. If you are upgrading from an earlier version, readNEWS.txtfrom the latest version back to your current version. -
Ensure that your version of Cassandra is compatible with the version of Cassandra that is in DSE. See the Cassandra changes in
CHANGES.txtand the upgrade paths. -
Before upgrading, be sure that each node has adequate free disk space.
Determine current data disk space usage:
sudo du -sh /var/lib/cassandra/data/Result
3.9G /var/lib/cassandra/data/Determine available disk space:
sudo df -hT /Result
Filesystem Type Size Used Avail Use% Mounted on /dev/sda1 ext4 59G 16G 41G 28% /The required space depends on the compaction strategy. See Disk space.
-
Upgrade the SSTables on each node to ensure that all SSTables are on the current version.
You must upgrade SSTables on your nodes before and after upgrading. Failure to upgrade SSTables will result in severe performance degradation, increased disk usage, and possible data loss.
nodetool upgradesstablesYou can use the
--jobsoption to set the number of SSTables that upgrade simultaneously. The default setting is2, which minimizes impact on the cluster. Set to0to use all available compaction threads. DataStax recommends running theupgradesstablescommand on one node at a time or, when using racks, one rack at a time.If the SSTables are already on the current version, the command returns immediately and no action is taken.
-
Verify the Java runtime version and upgrade to a supported version if needed:
java -versionResult
openjdk version "1.8.0_222" OpenJDK Runtime Environment (build 1.8.0_222-8u222-b10-1ubuntu1~18.04.1-b10) OpenJDK 64-Bit Server VM (build 25.222-b10, mixed mode)For DSE 5.1 and 6.8, OpenJDK 8 (1.8.0_151 minimum) and Oracle Java SE 8 (JRE or JDK) (1.8.0_151 minimum) are supported. OpenJDK is recommended because DataStax does more extensive testing on OpenJDK than Oracle Java.
If you plan to continue the upgrade to DSE 6.9, you might be aware that DSE 6.9 requires Java 11. After you upgrade to DSE 5.1, you will upgrade to Java 11 as part of your upgrade to 6.9.
-
Run
nodetool repairto ensure that data on each replica is consistent with data on other nodes:nodetool repair -pr -
Install the
libaiopackage for optimal performance.-
RHEL
-
Debian
sudo yum install libaiosudo apt-get install libaio1 -
-
Back up any customized configuration files since they can be overwritten with default values during installation of the new version.
If you backed up your installation using the instructions in Backing up a tarball installation or Backing up a package installation, your original configuration files are included in the archive.
Upgrade steps
The upgrade process requires upgrading and restarting one node at a time in the following order:
-
If using racks, upgrade node-by-node within one rack.
-
Upgrade rack-by-rack within one datacenter, and upgrade seed nodes in a datacenter before non-seed nodes.
-
Upgrade datacenter-by-datacenter within one cluster.
-
Repeat to upgrade the next cluster until you have upgraded all nodes (by rack and datacenter) in all clusters.
Follow these steps for each node’s upgrade to DSE 5.1. The configuration changes in these steps are performed in the upgraded version, and they use DSE 5.1 documentation if version-specific documentation is necessary.
-
Flush the commit log of the current installation:
nodetool drain -
Uninstall Cassandra.
If you installed Cassandra from packages in APT or RPM repositories, you must remove the packages before setting up and installing DSE.
-
APT package installations
-
RPM package installations
-
Tarball installations
For packages installed from APT repositories, run the following command:
sudo apt-get autoremove "dsc*" "cassandra*" "apache-cassandra*"This action shuts down Cassandra if it is still running before uninstalling it.
For packages installed from Yum repositories, run the following command:
sudo yum remove "dsc*" "cassandra*" "apache-cassandra*"It is normal for the old Cassandra configuration file to be renamed to
cassandra.yaml.rpmsave. For example:warning: /etc/cassandra/default.conf/cassandra.yaml saved as /etc/cassandra/default.conf/cassandra.yaml.rpmsaveIf you installed Cassandra with a binary tarball, run the following commands, and then remove the Cassandra installation directory:
ps auwx | grep cassandrasudo kill cassandra_pid -
-
Install DSE 5.1 using the same installation method (package or tarball) that you used for Cassandra.
-
After upgrading but before restarting a node, compare changes in the new configuration files with your backup configuration files, remove deprecated settings, and update any new settings if required.
You must use the new configuration files that are generated from the upgrade installation. Copy individual parameters from your old configuration files into the new files. Don’t replace the newly-generated configuration files with the old files.
You can use the DSE
yaml_difftool to compare backup YAML files with the upgraded YAML files:cd /usr/share/dse/tools/yamls ./yaml_diff path/to/yaml-file-old path/to/yaml-file-newResult
... CHANGES ========= authenticator: - AllowAllAuthenticator + com.datastax.bdp.cassandra.auth.DseAuthenticator authorizer: - AllowAllAuthorizer + com.datastax.bdp.cassandra.auth.DseAuthorizer roles_validity_in_ms: - 2000 + 120000 ... -
If upgrading from Cassandra 3.11.2 or later, comment out the
enable_materialized_viewsandenable_sasi_indexesparameters incassandra.yamlif they exist.Where is the
cassandra.yamlfile?The location of the
cassandra.yamlfile depends on the type of installation:-
Package installations:
/etc/dse/cassandra/cassandra.yaml -
Tarball installations:
INSTALL_DIRECTORY/resources/cassandra/conf/cassandra.yaml
-
-
Verify that the upgraded datacenter names match the datacenter names in the keyspace schema definition:
-
Get the node’s datacenter name:
nodetool status | grep "Datacenter"Result
Datacenter: datacenter-name -
Verify that the node’s datacenter name matches the datacenter name for a keyspace:
cqlsh --execute "DESCRIBE KEYSPACE keyspace-name;" | grep "replication"Result
CREATE KEYSPACE keyspace-name WITH replication = {'class': 'NetworkTopologyStrategy', 'datacenter-name': '3'};
-
-
Review the logs for warnings, errors, and exceptions:
grep -w 'WARNING\|ERROR\|exception' /var/log/cassandra/*.logWarnings, errors, and exceptions are frequently found in the logs when starting an upgraded node. Some of these log entries are informational to help you execute specific upgrade-related steps. If you find unexpected warnings, errors, or exceptions, contact DataStax Support.
Non-standard log locations are configured in
dse-env.sh. -
Run
nodetool repair:bin/nodetool repair -prThroughout the upgrade process, make sure that you eventually run
nodetool repairon each node in each upgraded datacenter. -
Repeat the upgrade process on each node in the cluster following the recommended upgrade order.
-
After the entire cluster upgrade is complete, upgrade the SSTables on one node at a time or, when using racks, one rack at a time.
You must upgrade SSTables on your nodes before and after upgrading. Failure to upgrade SSTables will result in severe performance degradation, increased disk usage, and possible data loss.
The upgrade isn’t complete until the SSTables are upgraded.
nodetool upgradesstablesYou can use the
--jobsoption to set the number of SSTables that upgrade simultaneously. The default setting is2, which minimizes impact on the cluster. Set to0to use all available compaction threads. DataStax recommends running theupgradesstablescommand on one node at a time, or when using racks, one rack at a time.You can run the
upgradesstablescommand before all the nodes are upgraded as long as you run the command on only one node at a time, or, when using racks, one rack at a time. Runningupgradesstableson too many nodes at once degrades performance.
Post-upgrade steps
Your clusters are now upgraded to DSE 5.1. To continue your upgrade to DSE 6.8 or 6.9, follow the upgrade guide for your target version: