Upgrading to DataStax Enterprise 5.0

Read and understand these instructions before upgrading. Carefully reviewing the planning and upgrade instructions can prevent errors and data loss.

Follow these instructions to upgrade from DataStax Enterprise 4.7 or 4.8 to 5.0. If you have an earlier version, upgrade to the latest version before continuing.

Always upgrade to the latest patch release on your current version before you upgrade to a higher version. Fixes included in the latest patch release might help or smooth the upgrade process.

  • The latest version of DSE 4.8 is 4.8.16.

  • The latest version of DSE 4.7 is 4.7.9.

TTL expiration timestamps are susceptible to the year 2038 problem. If the TTL value is long and an expiration date that is greater than the maximum threshold of 2038-01-19T03:14:06+00:00, the data is immediately expired and purged on the next compaction. DataStax strongly recommends upgrading to DSE 5.0.15 or later and taking required action to protect against silent data loss.

On this page:

Apache Cassandra® version change

Upgrading from DataStax Enterprise 4.7 or 4.8 to 5.0 includes a major Cassandra version change.

Be sure to follow the recommendations for upgrading the SSTables.

Upgrading SSTables is required for upgrades that contain major Apache Cassandra releases:

  • DataStax Enterprise 5.0 uses Cassandra 3.0.

  • DataStax Enterprise 4.7 to 4.8 use Cassandra 2.1.

  • DataStax Enterprise 4.0 to 4.6 use Cassandra 2.0.

General recommendations

DataStax recommends backing up your data prior to any version upgrade, including logs and custom configurations. A backup provides the ability to revert and restore all the data used in the previous version if necessary.

OpsCenter provides a Backup Service that manages enterprise-wide backup and restore operations for DataStax Enterprise clusters. OpsCenter 6.5 and later is recommended.

General restrictions and limitations during the upgrade process

Restrictions and limitations apply while a cluster is in a partially upgraded state.

With these exceptions, the cluster continues to work as though it were on the earlier version of DataStax Enterprise until all of the nodes in the cluster are upgraded.

  • General upgrade restrictions durning an upgrade

    • Do not enable new features.

    • During the upgrade, do not bootstrap or decommission nodes.

    • Do not issue these types of CQL queries during a rolling restart: DDL and TRUNCATE.

    • Do not enable Change Data Capture (CDC) on a mixed-version cluster. Upgrade all nodes to DSE 5.1 or later before enabling CDC.

    • Complete the cluster-wide upgrade before the expiration of gc_grace_seconds (default 10 days) to ensure any repairs complete successfully.

Nodes on different versions might show a schema disagreement during an upgrade.

  • Restrictions for DSE Analytic (Hadoop and Spark) nodes

    • Do not run analytics jobs until all nodes are upgraded.

    • When upgrading to a major version of DSE, all nodes in a DSE datacenter that run Spark must be on the same version of Spark and the Spark jobs must be compiled for that version. Each datacenter acting as a Spark cluster must be on the same upgraded DSE version before reinitiating Spark jobs.

      In the case where Spark jobs run against Graph keyspaces, you must update all of the nodes in the cluster first to avoid Spark jobs failing.

  • DSE Search (Solr) upgrade restrictions and limitations

    • Do not update schemas.

    • Do not reindex DSE Search nodes during upgrade.

    • Do not issue these types of queries during a rolling restart: DDL or TRUNCATE.

    • While mixed versions of nodes exist during an upgrade, DataStax Enterprise runs two different servers for backward compatibility. One based on shard_transport_options, the other based on internode_messaging_options. (These options are located in dse.yaml.) After all nodes are upgraded to 5.0, internode_messaging_options are used. The internode_messaging_options are used by several components of DataStax Enterprise. For 5.0 and later, all internode messaging requests use this service.

  • Restrictions for nodes using any kind of security

    • Do not change security credentials or permissions until the upgrade is complete on all nodes.

    • If you are not already using Kerberos, do not set up Kerberos authentication before upgrading. First upgrade the cluster, and then set up Kerberos.

  • Upgrading drivers and possible impact when driver versions are incompatible

    Be sure to check driver compatibility. Depending on the driver version, you might need to recompile your client application code. See DataStax driver changes.

    During upgrades, you might experience driver-specific impact when clusters have mixed versions of drivers. If your cluster has mixed versions, the protocol version is negotiated with the first host that the driver connects to. To avoid driver version incompatibility during upgrades, use one of these workarounds:

    • Protocol version: Because some drivers can use different protocol versions, force the protocol version at start up. For example, keep the Java driver at its current protocol version while the driver upgrade is happening. Switch to the Java driver to the new protocol version only after the upgrade is complete on all nodes in the cluster.

    • Initial contact points: Ensure that the list of initial contact points contains only hosts with the oldest driver version. For example, the initial contact points contain only Java driver v2. For details on protocol version negotiation, see protocol versions with mixed clusters in the Java driver version you are using, for example, Java driver.

DataStax Enterprise and Apache Cassandra configuration files

DataStax Enterprise (DSE) configuration files
Configuration file Installer-Services and package installations Installer-No Services and tarball installations

dse

/etc/default/dse (systemd) or /etc/init.d/ (SystemV)

N/A Node type is set via command line flags.

dse-env.sh

/etc/dse/dse-env.sh

<installation_location>/bin/dse-env.sh

byoh-env.sh

/etc/dse/byoh-env.sh

<installation_location>/bin/byoh-env.sh

dse.yaml

/etc/dse/dse.yaml

<installation_location>/resources/dse/conf/dse.yaml

logback.xml

/etc/dse/cassandra/logback.xml

<installation_location>/resources/logback.xml

spark-env.sh

/etc/dse/spark/spark-env.sh

<installation_location>/resources/spark/conf/spark-env.sh

spark-defaults.conf

/etc/dse/spark/spark-defaults.conf

<installation_location>/resources/spark/conf/spark-defaults.conf

Cassandra configuration files

Configuration file

Installer-Services and package installations

Installer-No Services and tarball installations

cassandra.yaml

/etc/dse/cassandra/cassandra.yaml

<installation_location>/conf/cassandra.yaml

cassandra.in.sh

/usr/share/cassandra/cassandra.in.sh

<installation_location>/bin/cassandra.in.sh

cassandra-env.sh

/etc/dse/cassandra/cassandra-env.sh

<installation_location>/conf/cassandra-env.sh

cassandra-rackdc.properties

/etc/dse/cassandra/cassandra-rackdc.properties

<installation_location>/conf/cassandra-rackdc.properties

cassandra-topology.properties

/etc/dse/cassandra/cassandra-topology.properties

<installation_location>/conf/cassandra-topology.properties

jmxremote.password

/etc/cassandra/jmxremote.password

<installation_location>/conf/jmxremote.password

Tomcat server configuration file
Configuration file Installer-Services and package installations Installer-No Services and tarball installations

server.xml

/etc/dse/resources/tomcat/conf/server.xml

<installation_location>/resources/tomcat/conf/server.xml

For use with Spark, the default location of the hive-site.xml file is:

Package installations

/etc/dse/spark/hive-site.xml

Tarball installations

<installation_location>/resources/spark/conf/hive-site.xml

Preparing to upgrade

Follow these steps to prepare each node for upgrading from DataStax Enterprise 4.7 or 4.8 to DataStax Enterprise 5.0:

  1. Before upgrading, be sure that each node has adequate free disk space. The required space depends on the compaction strategy. See Disk space.

  2. Familiarize yourself with the changes and features in this release:

    • Be sure your platform is supported.

    • OpenJDK 8 or Oracle Java SE Runtime Environment 8 (JDK) (1.8.0_40 minimum). Earlier or later versions are not supported.

    • DataStax Enterprise 5.0 release notes.

    • General upgrading advice for any version and New features for Apache Cassandra 3.0 in NEWS.txt. Be sure to read the NEWS.txt all the way back to your current version.

    • Apache Cassandra changes in CHANGES.txt.

    • DataStax Enterprise 5.0 production-certified changes to Apache Cassandra.

    • DataStax driver changes.

      DataStax drivers come in two types:
      • DataStax drivers for DataStax Enterprise (DSE) — for use by DSE 4.8 and later

      • DataStax drivers for Apache Cassandra — for use by Apache Cassandra and DSE 4.7 and earlier

        While the DataStax drivers for Apache Cassandra drivers can connect to DSE 5.0 and later clusters, DataStax strongly recommends upgrading to the DSE drivers. The DSE drivers provide functionality for all DataStax Enterprise (DSE) features.

  3. Verify your current product version. If necessary, upgrade to an interim version:

    Current version Upgrade version

    DataStax Enterprise 4.7 or 4.8

    DataStax Enterprise 5.0

    DataStax Enterprise 4.0, 4.5, or 4.6

    DataStax Enterprise 4.8

    DataStax Community or open source Apache Cassandra 2.1.x

    DataStax Enterprise 4.8

    DataStax Community 3.0.x

    No interim version required.

    DataStax Distribution of Apache Cassandra 3.x

    Upgrade not available.

  4. Replace SELECT * queries using prepared statements with SELECT statements using the relevant columns.

    The usage of SELECT * prepared statements is not recommended until all components use at least DSE 6.0 and DSE Driver version 1.5.0.

  5. Upgrade the SSTables on each node to ensure that all SSTables are on the current version. This is required for DataStax Enterprise upgrades that include a major Cassandra version changes.

    Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage.

    nodetool upgradesstables

    If the SSTables are already on the current version, the command returns immediately and no action is taken. See SSTable compatibility and upgrade version.

    Use the --jobs option to set the number of SSTables that upgrade simultaneously. The default setting is 2, which minimizes impact on the cluster. Set to 0 to use all available compaction threads. DataStax recommends running the upgradesstables command on one node at a time or, when using racks, one rack at a time.

    You can run the upgradesstables command before all the nodes are upgraded as long as you run the command on only one node at a time or, when using racks, one rack at a time. Running upgradesstables on too many nodes at once degrades performance.

    For information about nodetool upgradesstables, including how to speed it up, see the DataStax Support KB article Nodetool upgradesstables FAQ.

  6. Verify the Java runtime version and upgrade to the recommended version.

    java -version
    • Recommended. OpenJDK 8 (1.8.0_151 minimum)

      Recommendation changed due to the end of public updates for Oracle JRE/JDK 8. See Oracle Java SE Support Roadmap.

    • Supported. Oracle Java SE 8 (JRE or JDK) (1.8.0_151 minimum) The JDK is recommended for development and production systems, and provides useful troubleshooting tools that are not in the JRE, such as jstack, jmap, jps, and jstat.

      Although Oracle JRE/JDK 8 is supported, DataStax does more extensive testing on OpenJDK 8.

  7. Run nodetool repair to ensure that data on each replica is consistent with data on other nodes.

  8. DSE Search nodes:

    • The Lucene field cache (solr_field_cache_enabled) file is deprecated. This field is located in the dse.yaml file. Instead, for fields that are sorted, faceted, or grouped by, set docValues="true" on the field in the schema.xml file. Then RELOAD the core and reindex. The default value is false. To override false, set useFieldCache=true in the Solr request.

      During mixed versions upgrades, you can re-enable the field cache (solr_field_cache_enabled: true) to allow running queries but not reindexing.

    • All unique key elements must be indexed in the Solr schema.

      To verify unique key elements, review schema.xml to ensure that all unique key fields must have indexed=true. If required, make changes to schema.xml and reindex.

    • HTTP-based Solr shard transport option is deprecated. Use Inter-node messaging options instead. For 5.0, all internode messaging requests use this internal messaging service. HTTP is removed in 5.1.

    • Tune the schema before you upgrade. For DSE 5.0.10 and later, all field definitions in the schema are validated and must be DSE Search supported, even if the fields are not indexed, have docValues applied, or used for copy-field source. The default behavior of automatic resource generation includes all columns. To improve performance, take action to prevent the fields from being loaded from the database. Include only the required fields in the schema by removing or commenting out unused fields in the schema. After you change the schema, reload the Solr core.

  9. DSE Search partition key names

    The partition key names of COMPACT STORAGE tables backed by DSE Search indexes match the uniqueKey in schema.xml. For example, consider the following table is created with compact storage:

    CREATE TABLE keyspace_name.table_name (key text PRIMARY KEY, foo text, solr_query text)
    WITH COMPACT STORAGE

    and the Solr schema.xml is:

    ...
    <uniqueKey>id</uniqueKey>
    ...

    then rename the key in the table to match the schema:

    ALTER TABLE ks.table RENAME key TO id;
  10. DSE Analytics nodes

    When performing a rolling upgrade in a datacenter from DSE 4.8 to DSE 5.0 manually update the name of the metastore table used by Spark in hive-site.xml.

    Only perform this step if you want a rolling upgrade with no interruption before the entire datacenter is upgraded. DSE 5.0 elects the Spark Master after the entire datacenter is upgraded if you do not manually update hive-site.xml.

    For tarball installations:

    sudo perl -i -pe 's|cfs:///user/spark/warehouse|cfs:///user/hive/warehouse|g' /etc/dse/spark/hive-site.xml
    sudo perl -i -pe 's|sparkmetastore|MetaStore|g' /etc/dse/spark/hive-site.xml

    For package installations:

    sudo perl -i -pe 's|cfs:///user/spark/warehouse|cfs:///user/hive/warehouse|g' /usr/local/lib/dse/resources/spark/conf/hive-site.xml
    sudo perl -i -pe 's|sparkmetastore|MetaStore|g' /usr/local/lib/dse/resources/spark/conf/hive-site.xml

    Before DSE 5.0, Spark used the Hive metastore table HiveMetaStore.MetaStore. Starting in DSE 5.0, the Hive and Spark metastore tables have been separated, and Spark uses the HiveMetaStore.sparkmetastore table. If DSE 5.0 starts and the metastore table is missing, the node waits for the entire cluster to be upgraded before starting Spark because it has to create the metastore table first. Manually updating the configuration allows Spark nodes to create the metastore table and elect a Master in a mixed datacenter.

    Be prepared for some inconveniences during the rolling upgrade. If the Spark contact point is set to a DSE 5.0 node, it is able to use only DSE 5.0 replicas to access data. However if the contact point is set to a DSE 4.8 node, it is able to access data on all replicas in the cluster.

  11. Back up the configuration files you use to a folder that is not in the directory where you normally run commands. The configuration files are overwritten with default values during installation of the new version.

Upgrade steps

The DataStax installer upgrades DataStax Enterprise and automatically performs many upgrade tasks.

Follow these steps on each node to upgrade from DataStax Enterprise 4.7 or 4.8 to DataStax Enterprise 5.0. Some warning messages are displayed during and after upgrade.

The upgrade process for DataStax Enterprise provides minimal downtime (ideally zero). During this process, upgrade and restart one node at a time while other nodes continue to operate online. With a few exceptions, the cluster continues to work as though it were on the earlier version of DataStax Enterprise until all of the nodes in the cluster are upgraded.

  1. Upgrade order matters. Upgrade nodes in this order:

    1. In multiple datacenter clusters, upgrade every node in one datacenter before upgrading another datacenter.

    2. Upgrade the seed nodes within a datacenter first.

    3. DSE Analytics datacenters

      1. For DSE Analytics nodes using DSE Hadoop, upgrade the Job Tracker node first. Then upgrade Hadoop nodes, followed by Spark nodes.

    4. Transactional/DSE Graph datacenters

    5. DSE Search datacenters. With a few exceptions, the cluster continues to work as though it were on the earlier version of DataStax Enterprise until all of the nodes in the cluster are upgraded. Upgrade and restart the nodes one at a time. Other nodes in the cluster continue to operate at the earlier version until all nodes are upgraded.

  2. DSE Analytics nodes: Kill all Spark worker processes.

  3. To flush the commit log of the old installation:

    nodetool -h hostname drain

    This step saves time when nodes start up after the upgrade and prevents DSE Search nodes from having to reindex data.

    This step is mandatory when upgrading between major Cassandra versions that change SSTable formats, rendering commit logs from the previous version incompatible with the new version.

  4. Stop the node.

  5. Use the appropriate method to install DSE 5.0 on a supported platform:

    • Binary tarball

    • Debian-based systems using APT

    • RHEL-based systems using Yum

      For upgrades on RHEL-based systems that have demos installed, you must specify the package installation in a single line, and specify the version for dse-full and dse-demos. For example:

      sudo yum install dse-full-5.0.15-1  dse-demos-5.0.15-1

      Install the new product version using the same installation method that is on the system. The upgrade proceeds with installation regardless of the installation method and might result in issues.

  6. If the cluster will run Hadoop in a Kerberos secure environment, change the task-controller file ownership to root and access permissions to 4750. For example:

    sudo chown root /usr/share/dse/resources/hadoop/native/Linux-amd64-64/bin/task-controller
    $ sudo chmod 4750 /usr/share/dse/resources/hadoop/native/Linux-amd64-64/bin/task-controller

    Package installations only: The default location of the task-controller file should be /usr/share/dse/resources/hadoop/native/Linux-amd64-64/bin/task-controller.

  7. To configure the new product version:

    1. Compare your backup configuration files to the new configuration files:

      1. Look for any deprecated, removed, or changed settings.

      2. Be sure you are familiar with the Apache Cassandra and DataStax Enterprise changes and features in the new release.

      3. Ensure that keyspace replication factors are correct for your environment:

        1. Set the keyspace replication factor for analytics keyspaces.

        2. Set the keyspace replication factor for system_auth and dse_security keyspaces.

        3. Merge the applicable modifications into the new version.

        4. Start the node.

      4. Installer-Services and Package installations: See Starting DataStax Enterprise as a service.

      5. Installer-No Services and Tarball installations: See Starting DataStax Enterprise as a stand-alone process.

  8. Verify that the upgraded datacenter names match the datacenter names in the keyspace schema definition:

    nodetool status
  9. Review the logs for warnings, errors, and exceptions. Because DataStax Enterprise 5.0 uses Cassandra 3.0, the output.log might include warnings about the following:

    • sstable_compression

    • chunk_length_kb

    • memory_allocator

    • memtable_allocation_type

    • offheap_objects

    • netty_server_port - used only during the upgrade to 5.0.

    After all nodes are running 5.0, requests that are coordinated by this node no longer contact other nodes on this port. Instead requests use inter-node messaging options. The internode_messaging_options are used by several components of DataStax Enterprise. For 5.0 and later, all internode messaging requests use this service. Warnings, errors, and exceptions are frequently found in the logs when starting an upgraded node. Some of these log entries are informational to help you execute specific upgrade-related steps. If you find unexpected warnings, errors, or exceptions, contact DataStax Support.

    During upgrade of DSE Analytics nodes, exceptions about the Task Tracker are logged in the nodes that are not yet upgraded to 5.0. The jobs succeed after the entire cluster is upgraded.

  10. Repeat the upgrade on each node in the cluster following the recommended order.

  11. After all nodes are upgraded, you must drop the following legacy tables: system_auth.users, system_auth.credentials and system_auth.permissions. This step is required for all workloads when legacy tables exist.

    As described in Cassandra NEWS.txt, the authentication and authorization subsystems have been redesigned to support role-based access control (RBAC), which results in a change to the schema of the system_auth keyspace.

  12. DSE Search only for DSE 5.0.12 and later After the upgrade, you must do a full reindex of all encrypted search indexes on each node in your cluster. Slow startup on nodes with large encrypted indexes is resolved. However, action is required to realize the performance gains. Plan sufficient time after the upgrade is complete to reindex with deleteAll=true in a rolling fashion. For example:

    dsetool reload_core keyspace\_name.table\_name distributed=false reindex=true deleteAll=true
  13. After the new version is installed on all nodes, upgrade the SSTables:

    nodetool upgradesstables

    Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage. Upgrading is not complete until the SSTables are upgraded.

    Use the --jobs option to set the number of SSTables that upgrade simultaneously. The default setting is 2, which minimizes impact on the cluster. Set to 0 to use all available compaction threads.

    For information about nodetool upgradesstables, including how to speed it up, see the DataStax Support KB article Nodetool upgradesstables FAQ.

  14. For multiple datacenter deployments, change the replication factor of the system_distributed keyspace to NetworkTopologyStrategy.

  15. If you use the OpsCenter Repair Service, turn on the Repair Service.

Warning messages during and after upgrade

You can ignore some log messages that occur during and after an upgrade.

If you made schema changes shortly before upgrading to DataStax Enterprise 5.0, log messages similar to the following might appear after upgrading:

WARN  [main] 2016-06-23 12:01:59,693  CommitLogReplayer.java:154 - Skipped 31 mutations from unknown (probably removed) CF with id b0f22357-4458-3cdb-9631-c43e59ce3676
WARN  [main] 2016-06-23 12:01:59,693  CommitLogReplayer.java:154 - Skipped 1 mutations from unknown (probably removed) CF with id 3aa75225-4f82-350b-8d5c-430fa221fa0a
WARN  [main] 2016-06-23 12:01:59,696  CommitLogReplayer.java:154 - Skipped 1 mutations from unknown (probably removed) CF with id 45f5b360-24bc-3f83-a363-1034ea4fa697
WARN  [main] 2016-06-23 12:01:59,696  CommitLogReplayer.java:154 - Skipped 1 mutations from unknown (probably removed) CF with id 0359bc71-7123-3ee1-9a4a-b9dfb11fc125
WARN  [main] 2016-06-23 12:01:59,697  CommitLogReplayer.java:154 - Skipped 1 mutations from unknown (probably removed) CF with id 296e9c04-9bec-3085-827d-c17d3df2122a

You can safely ignore these log messages.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com