Upgrade DataStax Enterprise (DSE) 5.1 to 6.9
The upgrade process for DataStax Enterprise (DSE) provides minimal downtime (ideally zero). During this process, upgrade and restart one node at a time while other nodes continue to operate online. With a few exceptions, the cluster continues to work as though it were on the earlier version of DSE until all of the nodes in the cluster are upgraded.
Data loss possible with tuples
To avoid data loss for databases that use the tuple data type, follow these instructions when upgrading from any DSE version earlier than 6.8.35 to any later version:
|
Carefully review the planning guide and all upgrade instructions before you begin the upgrade to reduce the chance of errors and data loss. In addition, review the DSE 6.9 release notes to understand major changes in the new version. |
Back up your existing installation
DataStax recommends backing up your data prior to any version upgrade. |
A backup provides the ability to revert and restore all the data used in the previous version if necessary.
For automatic backups, use the OpsCenter Backup Service (recommended).
For manual backup instructions, see Backing up a tarball installation or Backing up a package installation.
Upgrade SSTables
You must upgrade SSTables on your nodes before and after upgrading the DSE software binaries. Failure to upgrade SSTables will result in severe performance degradation and possible data loss. |
Version-specific notes
- DSE Search changes
-
Starting with DSE version 6.7.7, the system enables by default the Solr
timeAllowed
parameter. This setting prevents long-running shard queries, such as complex facets and Boolean queries, from using system resources after timing out from the DSE Search coordinator. For details, see Limiting queries by time. - DSE Search changes
-
Starting with DSE version 6.8.0, the system no longer allows unbounded facet searches using the
facet.limit=-1
parameter. The maximum facet limit value is 20,000 as set bysolr.max.facet.limit.size
. While possible to override the facet limit size using-Dsolr.max.facet.limit.size
in the appropriate jvm[ 8 | 11 ]-server.options, it is not recommended. - metadata_directory replaces system.local and system.peers
-
DSE 6.8 introduced a
metadata_directory
property that holds information about the local node and all peers. Thismetadata_directory
stores the same information assystem.local
andsystem.peers
in earlier versions.
Upgrade restrictions and limitations
Restrictions and limitations apply while a cluster is in a partially upgraded state. This means that some, but not all, nodes in the cluster have been upgraded. The cluster continues to work as though it were on the earlier version of DSE until all of the nodes in the cluster are upgraded. For this reason, you must avoid certain operations until the upgrade is complete on all nodes.
Nodes on different versions might show a schema disagreement during an upgrade. This is normal.
General restrictions
-
Don’t enable new features.
-
Upgrade OpsCenter if necessary. The minimum compatible version for DSE 6.9 is OpsCenter 6.8.39.
-
Don’t run
nodetool repair
. -
Stop the OpsCenter Repair Service if enabled.
-
During the upgrade, don’t bootstrap new nodes or decommission existing nodes.
-
Don’t alter schemas for any workloads.
-
Complete the cluster-wide upgrade before the expiration of
gc_grace_seconds
(default 10 days) to ensure any repairs are successful. -
If you disabled the DSE Performance Service before the upgrade, don’t reenable it during the upgrade.
Restrictions for nodes using security
-
Don’t change security credentials or permissions until the upgrade is complete on all nodes.
-
If you aren’t already using Kerberos, don’t set up Kerberos authentication immediately before upgrading. First upgrade the cluster, and then set up Kerberos.
You must modify the upgrade process if your cluster uses any form of internode encryption, including when you enable transitional mode to permit an internode encryption-based cluster to interact with unencrypted nodes. In DSE 6.9.7 and later, the To allow the cluster to continue to function during an upgrade to DSE 6.9.7 or later, do the following:
|
Restrictions for DSE Analytics nodes
Spark versions change between major DSE versions. The DSE release notes indicate which version of Spark is used.
When upgrading to a major version of DSE, all nodes in a DSE datacenter that run Spark must be on the same version of Spark, and the Spark jobs must be compiled for that version. Each datacenter acting as a Spark cluster must be on the same upgraded DSE version before reinitiating Spark jobs.
If you have Spark jobs that run against Graph keyspaces, you must update all of the nodes in the cluster first to prevent Spark jobs from failing.
Restrictions for DSE Advanced Replication nodes
DSE supports upgrades for DSE Advanced Replication v2 only.
Application code and driver compatibility
Be sure to check driver compatibility between the previous and new database versions.
Advanced preparation for upgrading DSE Search nodes
Before continuing, complete all the advanced preparation steps on DSE Search nodes while DSE 5.1 is still running:
-
Change all use of HTTP API writes to instead use CQL commands for updates and inserts.
-
Edit the search index config and make these changes as needed. See Search index config for valid options to change query behavior for search indexes.
-
Remove the unsupported
dataDir
option. You can still set the location of search indexes. -
Remove
mergePolicy
,maxMergeDocs
, andmergeFactor
. For example, if you have the following configuration:<mergeFactor>25</mergeFactor> <maxMergeDocs>... <mergePolicy>...
Use
mergePolicyFactory
instead, and addmergeScheduler
:<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"> <int name="maxThreadCount">16</int> <int name="maxMergeCount">32</int> </mergeScheduler> ... <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="segmentsPerTier">10</int> </mergePolicyFactory>
-
Remove any instance of
ExtractingRequestHandler
. -
Remove
DSENRTCachingDirectoryFactory
. Change the following configuration:<directoryFactory name="DirectoryFactory" class="com.datastax.bdp.search.solr.DSENRTCachingDirectoryFactory"/>
To the new configuration:
<directoryFactory name="DirectoryFactory" class="solr.StandardDirectoryFactory"/>
-
-
Ensure the presence of the
catalina.properties
andcontext.xml
files in the Tomcatconf
directory.DSE won’t start after the upgrade if these files are missing.
The default location of the
tomcat/conf
directory is/etc/dse/tomcat/conf
for package installations and<installation_location>/resources/tomcat/conf
for tarball installations. -
If earlier DSE versions use a custom configuration for the Solr UI
web.xml
, change the following configuration:<filter-class>com.datastax.bdp.search.solr.auth.DseAuthenticationFilter</filter-class>
To the new configuration:
<filter-class>com.datastax.bdp.cassandra.auth.http.DseAuthenticationFilter</filter-class>
-
Be aware that the
StallMetrics MBean
is removed in DSE 6.9.
Advanced preparation for upgrading DSE Graph nodes
Ensure that edge label names and property key names use supported characters only. Edge label names and property key names allow only lowercase letters a-z, uppercase letters A-Z, 0-9, underscores, hyphens, and periods. In earlier versions, edge label names and property key names allowed nearly unrestricted Unicode.
-
Use
schema.describe()
to get the entire schema, even if it contains illegal names. -
In-place upgrades allow existing schemas with invalid edge label names and property key names.
-
Schema elements with illegal names cannot be updated or added.
Advanced preparation for upgrading DSE Analytics nodes
Before upgrading DSE Analytics nodes, do the following:
-
If you programmatically set the
shuffle
parameter, you must change the code for applications that useconf.set("spark.shuffle.service.port", port)
. Instead, usedse spark-submit
which automatically sets the correct service port based on the authentication state. -
If DSEFS is enabled, copy the CFS
hivemetastore
directory (if it exists) to DSEFS:DSE_HOME/bin/dse hadoop fs -cp cfs://node_ip_address/user/spark/warehouse/ dsefs://node_ip_address/user/spark/warehouse/
-
If present in your DSE 5.1 installation, remove the Cassandra File System (CFS) by removing the
cfs
andcfs_archive
keyspaces before upgrading. For more information, see Copying data from CFS to DSEFS.cqlsh> DROP KEYSPACE cfs
cqlsh> DROP KEYSPACE cfs_archive
-
Make sure any use of the
SPARK_LOCAL_DIRS
andSPARK_EXECUTOR_DIRS
environment variables match their use as described in 5.1@dse:spark:configure-node.adoc#configureEnvVar. -
For applications to use a compatible Spark Jobserver API from the DataStax repository, you must migrate any jobs that extend from
SparkHiveJob
orSparkSqlJob
toSparkSessionJob
. See theDemoSparkSessionJob
example in thedemos
directory, which is located at/usr/share/dse/demos
for package installations and<installation_location>/demos
for tarball installations.
Prepare to upgrade
Follow these steps to prepare each DSE 5.1 node for the upgrade.
The DataStax Installer is not supported for DSE 6.0 and later. If you installed DSE 5.x with the DataStax Installer, you must first change from a standalone installer installation to a tarball or package installation for the same DSE version. For instructions, see Convert DataStax Installer installations. |
-
Upgrade to the latest patch release on your current version. The latest patch release includes fixes that can simplify the upgrade process.
Get the current DSE version, and then compare it with the latest patch release:
bin/dse -v current_dse_version
-
Familiarize yourself with the changes and features in the new release as listed in the DSE 6.9 release notes.
-
Ensure that each node has enough free disk space for the upgrade. The required space depends on the compaction strategy.
-
Determine current DSE data disk space usage:
sudo du -sh /var/lib/cassandra/data/
Results
3.9G /var/lib/cassandra/data/
-
Determine available disk space:
sudo df -hT /
Results
Filesystem Type Size Used Avail Use% Mounted on /dev/sda1 ext4 59G 16G 41G 28% /
-
If necessary, make adjustments to allow for more disk space.
-
-
Replace ITriggers and custom interfaces.
You must replace all custom implementations, including the following interfaces, with supported implementations when upgrading to DSE 6.9.x:
-
Custom implementation replacements modify the
org.apache.cassandra.triggers.ITrigger
interface fromaugment
toaugmentNonBlocking
for non-blocking internal architecture. You must provide updated trigger implementations on upgraded nodes.Check for existing triggers:
cqlsh> SELECT * FROM system_schema.triggers;
If you’re unsure, drop all existing triggers before upgrading:
cqlsh> DROP TRIGGER trigger_name ON keyspace_name.table_name;
-
Custom implementation replacements modify the
org.apache.cassandra.index.Index
interface to comply with the core storage engine changes. You are required to update implementations.Check for existing indexes:
cqlsh> SELECT * FROM system_schema.indexes;
If you’re unsure, drop all existing custom secondary indexes before upgrading, except DSE Search indexes, which don’t need to be replaced:
cqlsh> DROP INDEX index_name;
-
Custom implementation replacements change
org.apache.cassandra.cql3.QueryHandler
,org.apache.cassandra.db.commitlog.CommitLogReadHandler
, and other extension points. For more information about query handlers and custom payloads, see the documentation for your Cassandra driver.If you need assistance, contact DataStax Support.
-
-
Because Thrift-compatible tables (
COMPACT STORAGE
) are no longer supported, you must migrate all non-system tables that haveCOMPACT STORAGE
to CQL table format before upgrading.DSE won’t start if tables using
COMPACT STORAGE
are present.Get the schema:
cqlsh -e 'DESCRIBE FULL SCHEMA;' > schema_file
Use a script to run
ALTER TABLE
commands that dropCOMPACT STORAGE
from each table:cat schema_file | while read -d $';\n' line ; do if echo "$line"|grep 'COMPACT STORAGE' 2>&1 > /dev/null ; then TBL="`echo $line|sed -e 's|^CREATE TABLE \([^ ]*\) .*$|\1|'`" if echo "$TBL"|egrep -v '^system' 2>&1 > /dev/null; then echo "ALTER TABLE $TBL DROP COMPACT STORAGE;" >> schema-drop-list fi fi done
This script dumps the complete DSE schema to
schema_file
, usesgrep
to find lines containingCOMPACT STORAGE
, and then writes only those table names toschema-drop-list
along with the requiredALTER TABLE
commands.The
schema-drop-list
file is subsequently read bycqlsh
, which executes theALTER TABLE
commands listed within it:cqlsh -f schema-drop-list
-
Upgrade the SSTables on each node to ensure that all SSTables are on the current version:
nodetool upgradesstables
Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage.
Use the
--jobs
option to set the number of SSTables that upgrade simultaneously. The default setting is2
, which minimizes impact on the cluster. Set to0
to use all available compaction threads. DataStax recommends running theupgradesstables
command on one node at a time. When using racks, run the command on one rack at a time.If the SSTables are already on the current version, the
upgradesstables
command returns immediately and no action is taken. -
Ensure that keyspace replication factors are correct for your environment:
cqlsh --execute "DESCRIBE KEYSPACE keyspace-name;" | grep "replication" CREATE KEYSPACE keyspace-name WITH replication = {'class': 'NetworkTopologyStrategy, 'replication_factor': '3'} AND durable_writes = true;
-
Check the keyspace replication factor for Analytics keyspaces.
-
Check the keyspace replication factor for
system_auth
anddse_security
keyspaces.
-
-
Verify the Java runtime version and upgrade to a supported version if needed:
java -version
Results
java version "11.0.x" YYYY-MM-DD LTS Java(TM) SE Runtime Environment 18.9 (build 11.0.x+xx-LTS-219) Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.x+xx-LTS-219, mixed mode)
OpenJDK 11 (11.0.19 minimum) and Oracle Java SE 11 (JRE or JDK) (11.0.18 minimum) are supported. OpenJDK 11 is recommended.
-
Verify the Python version is between Python 3.8 and 3.11:
python --version
Results
Python 2.7.18
If needed, download and install a supported version.
-
Run
nodetool repair
to ensure that data on each replica is consistent with data on other nodes:nodetool repair -pr
-
Install the
libaio
package for optimal performance.-
RHEL:
sudo yum install libaio
-
Debian:
sudo apt-get install libaio1
-
-
Back up any customized configuration files since they can be overwritten with default values when installing the new version.
If you performed a manual backup (either Backing up a tarball installation or Backing up a package installation), your original configuration files are included in the archive.
Upgrade steps
You can also use OpsCenter 6.8 Lifecycle Manager (LCM) to clone a configuration profile and run an upgrade job on a datacenter or node. |
The upgrade process requires upgrading and restarting one node at a time in the following order:
-
In multiple datacenter clusters, upgrade every node in one datacenter before upgrading another datacenter.
-
Upgrade the seed nodes within a datacenter first.
-
Upgrade DSE Analytics datacenters.
-
Upgrade transactional (DSE Graph) datacenters.
-
Upgrade DSE Search nodes or datacenters.
Follow these steps for each node’s upgrade in your upgraded version, DSE 6.9.x:
-
Flush the commit log of the current installation:
nodetool drain
-
Stop the node:
-
Package installations
-
Tarball installations
sudo service dse stop
installation_dir/bin/dse cassandra-stop
-
-
Install the new product version on a supported platform using the same installation type as your current installation. Mismatched installation types can cause problems with the upgrade.
-
After upgrading but before restarting a node, compare changes in the new configuration files with your backup configuration files. Remove deprecated settings and update any new settings if required.
You must use the new configuration files that are generated from the upgrade installation. Copy individual parameters from your old configuration files into the new files. Don’t replace the newly-generated configuration files with the old files.
You can use the
yaml_diff tool
to compare backup YAML files with the upgraded YAML files:cd /usr/share/dse/tools/yamls ./yaml_diff path/to/yaml-file-old path/to/yaml-file-new
Results
... CHANGES ========= authenticator: - AllowAllAuthenticator + com.datastax.bdp.cassandra.auth.DseAuthenticator authorizer: - AllowAllAuthorizer + com.datastax.bdp.cassandra.auth.DseAuthorizer roles_validity_in_ms: - 2000 + 120000 ...
-
In
cassandra.yaml
, remove the following general deprecated settings:concurrent_counter_writes concurrent_materialized_view_writes concurrent_reads concurrent_writes max_client_wait_time_ms max_threads request_scheduler request_scheduler_options rpc_port rpc_server_type start_rpc thrift_framed_transport_size_in_mb
-
In
cassandra.yaml
, replace the deprecated RPC settingsrpc_address
andrpc_broadcast_address
with the following settings:native_transport_address native_transport_broadcast_address
-
In
cassandra.yaml
, replace the deprecated memtable settingsmemtable_heap_space_in_mb
andmemtable_offheap_space_in_mb
with the following setting:memtable_space_in_mb
-
In
cassandra.yaml
, setmemtable_allocation_type
tooffheap_objects
:memtable_allocation_type: offheap_objects
-
In
cassandra.yaml
, replace the deprecated user-defined function (UDF) settingsuser_defined_function_warn_timeout
anduser_defined_function_fail_timeout
with the following settings:user_defined_function_warn_micros: 500 user_defined_function_fail_micros: 10000 user_defined_function_warn_heap_mb: 200 user_defined_function_fail_heap_mb: 500 user_function_timeout_policy: die
The new settings are in microseconds, and the new timeouts aren’t equivalent to the deprecated settings.
-
In
cassandra.yaml
, replace the deprecatedstore_type
setting inserver_encryption_options
andclient_encryption_options
with the following configurations:server_encryption_options: keystore_type: JKS truststore_type: JKS ... client_encryption_options: keystore_type: JKS truststore_type: JKS
Valid type options for
keystore_type
areJKS
,JCEKS
,PKCS11
, orPKCS12
, andtruststore_type
acceptsJKS
,JCEKS
, orPKCS12
. -
In
server_encryption_options
andclient_encryption_options
, set theprotocol
to TLS:For security reasons and the deprecation of SSLv3 in Oracle Java 8u31, DSE versions 6.8 and later only allow the TLS encryption option protocol.
server_encryption_options: ... protocol: TLS ... client_encryption_options: ... protocol: TLS
-
In
cassandra.yaml
, remove the deprecated credential cache settingscredentials_validity_in_ms
andcredentials_update_interval_in_ms
. In DSE 6.9, caches are optimized without those settings. -
In
dse.yaml
, remove the following deprecated Spark resource and encryption options:#remove spark_ui_options: server_encryption_options: store_type: JKS
Replace the deprecated options with the following configuration:
spark_ui_options_options: server_encrption_options: keystore_type: JKS truststore_type: JKS
Valid type options for
keystore_type
areJKS
,JCEKS
,PKCS11
, orPKCS12
, andtruststore_type
acceptsJKS
,JCEKS
, orPKCS12
. -
In
dse.yaml
, remove the following deprecated DSE Search node settings. DSE 6.9 won’t start if these options are present.#remove cql_solr_query_executor_threads enable_back_pressure_adaptive_nrt_commit max_solr_concurrency_per_core solr_indexing_error_log_options
-
-
If you are performing an interrim upgrade to versions earlier than 5.1.16, 6.0.8, or 6.7.4, and any tables use DSE Tiered Storage, you must remove all
txn_compaction
log files from second-level tiers and lower.Failure to remove
txn_compaction
log files can result in data loss.For example, given the following
dse.yaml
configuration, you would removetxn_compaction
log files from/mnt2
and/mnt3
directories:tiered_storage_options: strategy1: tiers: - paths: - /mnt1 - paths: - /mnt2 - paths: - /mnt3
The following example removes the files using the
find
command:find /mnt2 -name "*_txn_compaction_*.log" -type f -delete && find /mnt3 -name "*_txn_compaction_*.log" -type f -delete
-
Remove any previously installed JTS (Java Topology Suite) JAR files from the
CLASSPATHS
in your DSE installation. JTS is distributed with DSE versions 6.7 and later. -
After making all the configuration changes, start the node.
-
Package installations
-
Tarball installations
sudo service dse start
installation_dir/bin/dse cassandra
-
-
Verify that the upgraded datacenter names match the datacenter names in the keyspace schema definition:
-
Get the node’s datacenter name:
nodetool status | grep "Datacenter"
Results
Datacenter: datacenter-name
-
Verify that the node’s datacenter name matches the datacenter name for a keyspace:
cqlsh --execute "DESCRIBE KEYSPACE keyspace-name;" | grep "replication" CREATE KEYSPACE keyspace-name WITH replication = {'class': 'NetworkTopologyStrategy, 'datacenter-name': '3'};
-
-
Review the logs for warnings, errors, and exceptions:
grep -w 'WARNING\|ERROR\|exception' /var/log/cassandra/*.log
Warnings, errors, and exceptions are frequently found in the logs when starting an upgraded node. Some of these log entries are informational to help you execute specific upgrade-related steps. If you find unexpected warnings, errors, or exceptions, contact DataStax Support.
Non-standard log locations are configured in
dse-env.sh
. -
Repeat the upgrade process on each node in the cluster following the recommended upgrade order.
-
After the entire cluster upgrade is complete, use
upgradesstables
to upgrade the SSTables on one node at a time. When using racks, upgrade one rack at a time. This sequence is required to avoid degraded performance from upgrading too many nodes at once.Failure to upgrade SSTables when required results in significant performance impacts, increased disk usage, and possible data loss. Your upgrade is incomplete until the SSTables are upgraded.
nodetool upgradesstables
You can use the
--jobs
option to set the number of SSTables that upgrade simultaneously. The default setting is2
, which minimizes impact on the cluster. Set to0
to use all available compaction threads.
General post-upgrade steps
After all nodes are upgraded:
-
If you use the OpsCenter Repair Service, turn it on.
-
If you encounter serialization-header errors, stop the node and repair them using the
sstablescrub -e
option:sstablescrub -e fix-only keyspace table
-
Learn about DSE Metrics Collector.
Starting with DSE 6.7, DSE Metrics Collector is enabled by default. This is a diagnostics information aggregator that helps facilitate DSE problem resolution.
Post-upgrade steps for DSE Search nodes
For DSE Search nodes, do the following:
-
The system no longer uses the appender
SolrValidationErrorAppender
or the loggerSolrValidationErrorLogger
, so you can safely remove them fromlogback.xml
. -
In contrast to earlier versions, DataStax recommends accepting the new default value of
1024
forback_pressure_threshold_per_core
indse.yaml
. For more information, see Tune DSE Search for maximum indexing throughput. -
Fully reindex all encrypted search indexes on each node in your cluster.
This is a long-running operation. Plan sufficient time after the upgrade is complete to reindex all nodes with
deleteAll=true
.dsetool reload_core keyspace_name.table_name distributed=false reindex=true deleteAll=true
Post-upgrade steps for DSE Analytics nodes
For DSE Analytics nodes, do the following:
-
Check the replication factor for the
dse_analytics
keyspace, which is a new keyspace that stores all DSE Analytics internal system data. DataStax recommends setting the replication strategy toNetworkTopologyStrategy
(NTS) with a minimum replication factor of 3 in each of your DSE Analytics datacenters. If a datacenter has more nodes, consider setting a larger replication factor.To check the replication strategy and factor, run:
cqlsh --execute "DESCRIBE KEYSPACE dse_analytics;" | grep "replication" CREATE KEYSPACE keyspace-name WITH replication = {'class': 'replication-strategy, 'datacenter-name': 'replication-factor'};
-
If you are using Spark SQL tables, migrate them to the new Hive metastore format:
dse client-tool spark metastore migrate --from 5.1.0 --to 6.9.0
-
Because the Spark Jobserver uses DSE custom version 0.8.0.56, you must ensure that applications use the compatible Spark Jobserver API from the DataStax repository.
Warning messages during and after upgrade
You can ignore some log messages that occur during and after an upgrade:
-
Some
gremlin_server
properties in earlier versions of DSE are no longer required. If properties exist in thedse.yaml
file after upgrading, logs display warnings similar to the following:WARN [main] 2017-08-31 12:25:30,523 GREMLIN DseWebSocketChannelizer.java:149 - Configuration for the org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0 serializer in dse.yaml overrides the DSE default - typically it is best to allow {product-short} to configure these.
You can ignore these warnings or modify
dse.yaml
so that only the requiredgremlin_server
properties are present.
Lock DSE package versions
If you have upgraded a DSE package installation, then you can prevent future unintended upgrades by locking the package version:
- RHEL
yum
installations -
-
Install
yum-versionlock
(one-time operation):sudo yum install yum-versionlock
-
Lock the current DSE version:
sudo yum versionlock dse-*
-
Later, you can clear the version lock and allow upgrades:
sudo yum versionlock clear
For details, see the
versionlock
command.
-
- Debian
apt-get
installations -
-
Hold the
dse
package at the current version:sudo apt-mark hold dse-*
Later, you can remove the version hold and allow upgrades:
sudo apt-mark unhold dse-*
For details, see the
apt-mark
command.
-