Upgrading from DataStax Enterprise 5.1 to 6.0

Instructions to upgrade to from DSE 5.1 to 6.0.

logback.xml

The location of the logback.xml file depends on the type of installation:
Package installations /etc/dse/cassandra/logback.xml
Tarball installations installation_location/resources/cassandra/conf/logback.xml

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations installation_location/resources/cassandra/conf/cassandra.yaml

Upgrade order

Upgrade nodes in this order:
  • In multiple datacenter clusters, upgrade every node in one datacenter before upgrading another datacenter.
  • Upgrade the seed nodes within a datacenter first.
  • Upgrade types in this order:
    1. DSE Analytics nodes or datacenters
    2. Transactional/DSE Graph nodes or datacenters
    3. DSE Search nodes or datacenters
  • For DSE Analytics nodes using DSE Hadoop, upgrade the Job Tracker node first. Then upgrade Hadoop nodes, followed by Spark nodes.

dse.yaml

The location of the dse.yaml file depends on the type of installation:
Package installations /etc/dse/dse.yaml
Tarball installations installation_location/resources/dse/conf/dse.yaml

DataStax Enterprise and Apache Cassandra™ configuration files

Configuration file Installer-Services and package installations Installer-No Services and tarball installations
DataStax Enterprise configuration files
byoh-env.sh /etc/dse/byoh-env.sh install_location/bin/byoh-env.sh
dse.yaml /etc/dse/dse.yaml install_location/resources/dse/conf/dse.yaml
logback.xml /etc/dse/cassandra/logback.xml install_location/resources/logback.xml
spark-env.sh /etc/dse/spark/spark-env.sh install_location/resources/spark/conf/spark-env.sh
spark-defaults.conf /etc/dse/spark/spark-defaults.conf install_location/resources/spark/conf/spark-defaults.conf
Cassandra configuration files
cassandra.yaml /etc/cassandra/cassandra.yaml install_location/conf/cassandra.yaml
cassandra.in.sh /usr/share/cassandra/cassandra.in.sh install_location/bin/cassandra.in.sh
cassandra-env.sh /etc/cassandra/cassandra-env.sh install_location/conf/cassandra-env.sh
cassandra-rackdc.properties /etc/cassandra/cassandra-rackdc.properties install_location/conf/cassandra-rackdc.properties
cassandra-topology.properties /etc/cassandra/cassandra-topology.properties install_location/conf/cassandra-topology.properties
jmxremote.password /etc/cassandra/jmxremote.password install_location/conf/jmxremote.password
Tomcat server configuration file
server.xml /etc/dse/resources/tomcat/conf/server.xml install_location/resources/tomcat/conf/server.xml

Follow these instructions to upgrade from DataStax Enterprise (DSE) 5.1 to DSE 6.0. If you are on DSE 5.0, see Upgrading from DataStax Enterprise 5.0 to 6.0.

Always upgrade to latest patch release on your current version before you upgrade to a higher version. Fixes included in the latest patch release might help or smooth the upgrade process.

The latest 5.1.x version of DSE is 5.1.11.

Attention: Read and understand these instructions before upgrading. Carefully reviewing the planning and upgrading instructions can ensure a smooth upgrade and avoid pitfalls and frustrations.
Important: Support for Thrift-compatible tables (COMPACT STORAGE) is dropped in DSE 6.0. Before upgrading to DSE 6.0, all tables that have COMPACT STORAGE to CQL table format must be migrated. Use the ALTER TABLE DROP COMPACT STORAGE command to migrate Thrift-compatible tables to CQL table format. This command is available in DSE 5.1.6 or later.

Apache Cassandra™ version change

Upgrading from DataStax Enterprise 5.1 to 6.0 includes a major Cassandra version change.
  • DataStax Enterprise 6.0 is compatible with Cassandra 3.11 and requires upgrading SSTables.
  • DataStax Enterprise 5.1 uses Cassandra 3.11.
  • DataStax Enterprise 5.0 uses Cassandra 3.0.
  • DataStax Enterprise 4.7 to 4.8 uses Cassandra 2.1.
  • DataStax Enterprise 4.0 to 4.6 uses Cassandra 2.0.
Be sure to follow the recommendations for upgrading the SSTables.

General recommendations

DataStax recommends backing up your data prior to any version upgrade, including logs and custom configurations. A backup provides the ability to revert and restore all the data used in the previous version if necessary.

Tip: OpsCenter provides a Backup service that manages enterprise-wide backup and restore operations for DataStax Enterprise clusters.

Upgrade restrictions and limitations

Restrictions and limitations apply while a cluster is in a partially upgraded state.

With these exceptions, the cluster continues to work as though it were on the earlier version of DataStax Enterprise until all of the nodes in the cluster are upgraded.

General upgrade restrictions
  • Do not enable new features.
  • Do not run nodetool repair. If you have the OpsCenter Repair Service configured, turn off the Repair Service.
  • Ensure OpsCenter compatibility. OpsCenter 6.5 is required for managing DSE 6.0 clusters. See DSE OpsCenter compatibility with DataStax Enterprise.
  • During the upgrade, do not bootstrap or decommission nodes.
  • Do not issue these types of CQL queries during a rolling restart: DDL and TRUNCATE.
  • During the upgrade, the nodes on different versions might show a schema disagreement.
  • Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage. Upgrading is not complete until the SSTables are upgraded.
  • NodeSync waits to start until all nodes are upgraded.
  • The default number of threads used by performance objects increased from 1 to 4. During upgrade, compatible performance objects continue to work during the upgrade process. Incompatible performance objects that require schema changes will work in legacy mode or will start working after the upgrade is complete. Do not change the configuration of performance objects during upgrade. If performance objects were disabled before the upgrade, do not enable them during upgrade.
Restrictions for DSE Advanced Replication nodes
Upgrades are supported only for DSE Advanced Replication V2.
Restrictions for DSE Analytic (Spark) nodes
  • Do not run analytics jobs until all nodes are upgraded.
  • Upgrade all nodes in the cluster to the new version before Spark Worker and Spark Master will start.
DSEFS nodes restrictions
During upgrade, DSEFS will not start on upgraded nodes. After all nodes are upgraded to 6.0.0, the DSEFS keyspace is adjusted and then DSEFS starts.
DSE Graph nodes restrictions
Graph nodes have the same restrictions as the workload they run on. General graph restrictions apply for all nodes, such as not altering graph schema during upgrades. Workload-specific restrictions apply for analytics and search nodes, such as no OLAP queries during upgrades.
DSE Search upgrade restrictions and limitations
  • Do not update schemas.
  • Do not reindex DSE Search nodes during upgrade.
  • DSE 6.0 introduces a new Lucene codec. Segments written with this new codec cannot be read by earlier versions of DSE. To downgrade to earlier versions, the entire data directory for the search index in question must be cleared.
Important: Before you upgrade DSE Search or SearchAnalytics workloads, you must follow the specific tasks in Preparing to upgrade section.
Restrictions for nodes using any kind of security
  • Do not change security credentials or permissions until the upgrade is complete on all nodes.
  • If you are not already using Kerberos, do not set up Kerberos authentication before upgrading. First upgrade the cluster, and then set up Kerberos.
Upgrading drivers and possible impact when driver versions are incompatible
Be sure to check driver compatibility. Depending on the driver version, you might need to recompile your client application code. See Upgrading DataStax drivers.
During upgrades, you might experience driver-specific impact when clusters have mixed versions of drivers. If your cluster has mixed versions, the protocol version is negotiated with the first host that the driver connects to. To avoid driver version incompatibility during upgrades, use one of these workarounds:
  • Protocol version: Because some drivers can use different protocol versions, force the protocol version at start up. For example, keep the Java driver at its current protocol version while the driver upgrade is happening. Switch to the Java driver to the new protocol version only after the upgrade is complete on all nodes in the cluster.
  • Initial contact points: Ensure that the list of initial contact points contains only hosts with the oldest driver version. For example, the initial contact points contain only Java driver v2.
For details on protocol version negotiation, see protocol versions with mixed clusters in the Java driver version you're using, for example, Java driver.

Preparing to upgrade

Follow these steps to prepare each node for upgrading from DataStax Enterprise 5.1 to DataStax Enterprise 6.0:
  1. Carefully review Planning your DataStax Enterprise upgrade.
    Attention:

    The upgrade process for DataStax Enterprise provides minimal downtime (ideally zero). During this process, upgrade and restart one node at a time while other nodes continue to operate online. With a few exceptions, the cluster continues to work as though it were on the earlier version of DataStax Enterprise until all of the nodes in the cluster are upgraded.

  2. Replace ITriggers and custom interfaces.

    Several internal and beta extension points were modified to necessitate core storage engine refactoring. All custom implementations, including the following interfaces, must be replaced with supported implementations when upgrading to DSE 6.0. Because a rewrite of the following interfaces is required for DSE 6.0, DataStax can help you find a solution.

    • The org.apache.cassandra.triggers.ITrigger interface was modified from augment to augmentNonBlocking for non-blocking internal architecture. Updated trigger implementations must be provided on upgraded nodes. If unsure, drop all existing triggers before upgrading.
    • The org.apache.cassandra.index.Index interface was modified to comply with the core storage engine changes. Updated implementations are required. If unsure, drop all existing custom secondary indexes before upgrading, except DSE Search indexes, which do not need to be replaced.
    • The org.apache.cassandra.cql3.QueryHandler, org.apache.cassandra.db.commitlog.CommitLogReadHandler, and other extension points have been changed.
  3. Before upgrading, be sure that each node has ample free disk space.

    The required space depends on the compaction strategy. See Disk space in Planning and testing DataStax Enterprise deployments.

  4. Familiarize yourself with the changes and features in this release:
  5. Verify your current product version.
    dse -v
    If necessary, upgrade to an interim version:
    Current version Upgrade version
    DataStax Enterprise 5.1 DataStax Enterprise 6.0
    DataStax Enterprise 5.0 DataStax Enterprise 5.1 or 6.0
    DataStax Distribution of Apache Cassandra™ 3.x DataStax Enterprise 5.1
  6. Upgrade to the latest patch release on your current version. The latest 5.1.x version of DSE is 5.1.11.

    Always upgrade to latest patch release on your current version before you upgrade to a higher version. Fixes included in the latest patch release might help or smooth the upgrade process.

    You can use OpsCenter 6.5 Lifecycle Manager (LCM) to clone a configuration profile and run an upgrade job on a datacenter or node. Upgrade jobs are supported for upgrades within a release series for DSE 5.0.x and later.

  7. Upgrade to the latest patch release on your current version. The latest 6.0.x version of DSE is 6.0.4.

    Always upgrade to latest patch release on your current version before you upgrade to a higher version. Fixes included in the latest patch release might help or smooth the upgrade process.

    You can use OpsCenter 6.5 Lifecycle Manager (LCM) to clone a configuration profile and run an upgrade job on a datacenter or node. Upgrade jobs are supported for upgrades within a release series for DSE 5.0.x and later.

  8. Upgrade the SSTables on each node to ensure that all SSTables are on the current version.
    nodetool upgradesstables
    This step is required for DataStax Enterprise upgrades that include major Cassandra version changes.
    Warning: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage.

    If the SSTables are already on the current version, the command returns immediately and no action is taken.

  9. Verify the Java runtime version and upgrade to the recommended version.
    java -version

    The latest version of OpenJDK 8 or Oracle Java SE 8 (JRE or JDK) (1.8u151 minimum) is recommended. The JDK is recommended for development and production systems, and provides useful troubleshooting tools that are not in the JRE, such as jstack, jmap, jps, and jstat.

    Important: Although Oracle JRE/JDK 8 is supported, DataStax does more extensive testing on OpenJDK 8 starting with DSE 6.0.3. This change is due to the end of public updates for Oracle JRE/JDK 8.
  10. Run nodetool repair to ensure that data on each replica is consistent with data on other nodes.
  11. Install the libaio package for optimal performance.
    RHEL platforms:
    sudo yum install libaio
    Debian:
    sudo apt-get install libaio1
  12. DSE Analytics nodes:
    1. Support for Thrift-compatible tables (COMPACT STORAGE) is dropped. Before upgrading, follow the steps to migrate all tables that have COMPACT STORAGE to CQL table format while DSE 5.x.x is running. For DSE Analytics, drop compact storage from all the tables in the "HiveMetaStore" and PortfolioDemo keyspaces.
      After COMPACT STORAGE is dropped, columns to support migration to CQL-compatible table format are added as described in migrating from compact storage.
      Attention: DSE 6.0 will not start if COMPACT STORAGE tables are present. Creating a COMPACT STORAGE table in a mixed-version cluster is not supported. Driver connections to the latest DSE 5.0.x and DSE 5.1.x run in a "NO_COMPACT" mode that causes compact tables to appear as if the compact flags were already dropped, but only for the current session.
    2. If you programmatically set the shuffle parameter, you must change the code for applications that use conf.set("spark.shuffle.service.port", port). Instead, use dse spark-submit which automatically sets the correct service port based on the authentication state. See Configuring Spark for more information.
    3. If DSEFS is enabled, copy CFS hivemetastore directory to dse:
      DSE_HOME/bin/dse hadoop fs -cp cfs://127.0.0.1/user/spark/warehouse/ dsefs://127.0.0.1/user/spark/warehouse/

      Another step is required after upgrade is complete.

    4. Cassandra File System (CFS) is removed. Remove the cfs and cfs_archive keyspaces before upgrading. See the From CFS to DSEFS blog post on the DataStax Developer website and the Copying data from CFS to DSEFS documentation for more information.
    5. Make sure any use of the SPARK_LOCAL_DIRS and SPARK_EXECUTOR_DIRS environment variables match their use as described in Setting environment variables.
    6. For applications to use the compatible Spark Jobserver API in DataStax repository, migrate jobs that extend from SparkHiveJob and SparkSqlJob to SparkSessionJob. See example in the DemoSparkSessionJob in the demos directory.
      Note: Spark Jobserver is the DSE custom version 0.8.0.44.
      The default location of the demos directory depends on the type of installation:
      • Package installations: /usr/share/dse/demos
      • Tarball installations: installation_location/demos
  13. DSE Search nodes: Review DSE 6.0.0 release notes for all changes.
    • Ensure all use of HTTP writes are changed to use CQL commands for updates and inserts.
    • Edit the search index config and make these changes, as needed:
      • Remove the unsupported dataDir option. To control where the DSE Search indexing data files are saved on the server, see Setting the location of search indexes.
      • Remove mergePolicy, maxMergeDocs, and mergeFactor. Use mergePolicyFactory instead.
      • Remove any instance of ExtractingRequestHandler.
      • Remove DSENRTCachingDirectoryFactory. Change:
        <directoryFactory name="DirectoryFactory" class="com.datastax.bdp.search.solr.DSENRTCachingDirectoryFactory"/>
        to:
        <directoryFactory name="DirectoryFactory" class="solr.StandardDirectoryFactory"/>
    • Ensure that the catalina.properties and context.xml files are present in the Tomcat conf dir. DSE will not start after upgrade if these files are missing.
      The default location of the Tomcat conf directory depends on the type of installation:
      • Package installations: /etc/dse/tomcat/conf
      • Tarball installations: installation_location/resources/tomcat/conf
    • If earlier DSE versions use a custom configuration for the Solr UI web.xml, change:
      <filter-class>com.datastax.bdp.search.solr.auth.DseAuthenticationFilter</filter-class>
      to
      <filter-class>com.datastax.bdp.cassandra.auth.http.DseAuthenticationFilter</filter-class>
    • StallMetrics MBean is removed. Change operators that use the MBean.
  14. DSE Graph nodes:
    • Ensure that edge label names and property key names use only the supported characters. Edge label names and property key names allow only [a-zA-Z0-9], underscore, hyphen, and period. In earlier versions, edge label names and property key names allowed nearly unrestricted unicode.
      • schema.describe() displays the entire schema, even if it contains illegal names.
      • In-place upgrades allow existing schemas with invalid edge label names and property key names.
      • Schema elements with illegal names cannot be uploaded or added.
  15. Back up the configuration files you use to a folder that is not in the directory where you normally run commands.

    The configuration files are overwritten with default values during installation of the new version.

Upgrade steps

Follow these steps on each node in the recommended order to upgrade from DataStax Enterprise 5.1 to DataStax Enterprise 6.0. Some warning messages are displayed during and after upgrade.

  1. Run nodetool drain to flush the commit log of the old installation:
    nodetool -h hostname drain
    This step saves time when nodes start up after the upgrade and prevents DSE Search nodes from having to reindex data.
    Important: This step is mandatory when upgrading between major Cassandra versions that change SSTable formats, rendering commit logs from the previous version incompatible with the new version.
  2. Stop the node. See Stopping a DataStax Enterprise node for more information.

    To stop DataStax Enterprise running as a service:

    sudo service dse stop

    To stop DataStax Enterprise running as a stand-alone process:

    Running nodetool drain before using the cassandra-stop command to stop a stand-alone process is not necessary because the cassandra-stop command drains the node before stopping it.

    bin/dse cassandra-stop
  3. Use the appropriate method to install the new product version on a supported platform:
    Note: Install the new product version using the same installation type that is on the system. The upgrade proceeds with the installation regardless of the installation type, and might result in issues.
  4. To configure the new product version:
    1. Compare your backup configuration files to the new configuration files:
      • Look for any deprecated, removed, or changed settings.
      • DSE Search nodes
        • While the node is down, edit dse.yaml and remove these options:
          • cql_solr_query_executor_threads
          • enable_back_pressure_adaptive_nrt_commit
          • max_solr_concurrency_per_core
          • solr_indexing_error_log_options
          DSE 6.0 will not start with these options present.
      • Be sure you are familiar with the Apache Cassandra and DataStax Enterprise changes and features in the new release.
      • Ensure that keyspace replication factors are correct for your environment:
    2. Edit the cassandra.yaml file and comment out or remove these credentials cache settings, if they exist:
      credentials_validity_in_ms
      credentials_update_interval_in_ms
      Caches are optimized without these settings.
    3. Merge the applicable modifications into the new version.
  5. Start the node.
  6. Verify that the upgraded datacenter names match the datacenter names in the keyspace schema definition:
    nodetool status
  7. Review the logs for warnings, errors, and exceptions.

    Warnings, errors, and exceptions are frequently found in the logs when starting an upgraded node. Some of these log entries are informational to help you execute specific upgrade-related steps. If you find unexpected warnings, errors, or exceptions, contact DataStax Support.

  8. Repeat the upgrade on each node in the cluster following the recommended order.
  9. When the upgrade includes a major Cassandra version, upgrade the SSTables. DataStax recommends upgrading the SSTables on one node at a time or when using racks, one rack at a time.
    Warning: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage. Upgrading is not complete until the SSTables are upgraded.
    nodetool upgradesstables

    If the SSTables are already on the current version, the command returns immediately and no action is taken. See SSTable compatibility and upgrade version.

    Use the --jobs option to set the number of SSTables that upgrade simultaneously. The default setting is 2, which minimizes impact on the cluster. Set to 0 to use all available compaction threads. DataStax recommends running the upgradesstables command on one node at a time or when using racks, one rack at a time.

    Note: You can run the upgradesstables command before all the nodes are upgraded as long as you run this command on only one node at a time or when using racks, one rack at a time. Running upgradesstables on too many nodes will degrade performance.

Recovery after upgrading to DSE 6.0 without dropping compact storage

DSE 6.0 removed support for Thrift-compatible tables (Compact Storage). All tables using Compact Storage must be dropped or migrated to CQL table format before upgrading to DSE 6.0. If a cluster has been upgraded to DSE 6.0 and any Compact Storage tables still exist, follow this procedure to recover and proceed with the upgrade:
  1. Downgrade any nodes which were already upgraded to DSE 6.0 to the latest version in the DSE 5.0 or 5.1 series:
    • DSE 5.0.x, downgrade to 5.0.14 or later
    • DSE 5.1.x, downgrade to 5.1.11 or later
  2. On each node that was attempted to be started on DSE 6.0, start DSE with the -Dcassandra.commitlog.ignorereplayerrors=true option.
  3. On one node (any node) in the cluster, DROP COMPACT STORAGE from tables which use it.
  4. Restart DSE to continue the upgrade to DSE 6.0.

After the upgrade

After all nodes are upgraded and running on DSE 6.0, complete these steps:

  1. If you use the OpsCenter Repair Service, turn on the Repair Service.
  2. DSE Search only:
    • The appender SolrValidationErrorAppender and the logger SolrValidationErrorLogger are no longer used and may safely be removed from logback.xml.
  3. DSE Analytics only: Check the replication factor for the dse_analytics keyspace, a new keyspace in DSE 6.0 which stores all DSE Analytics internal system data. We suggest setting the replication strategy to NetworkTopologyStrategy (NTS) with a replication factor of at least 3 in each of DSE Analytics datacenters. If a datacenter has more nodes, a larger replication factor should be considered.
  4. Spark Jobserver uses DSE custom version 0.8.0.44. Applications must use the compatible Spark Jobserver API from the DataStax repository.
  5. DSE Search: Slow startup on nodes with large encrypted indexes is resolved. However, action is required to realize the performance gains. You must do a full reindex of all encrypted search indexes on each node in your cluster. Plan sufficient time after the upgrade is complete to reindex with deleteAll=true in a rolling fashion. For example:
    dsetool reload_core keyspace_name.table_name distributed=false reindex=true deleteAll=true 

Warning messages during and after upgrade

Error messages provide information to help identify problems. You can ignore some log messages that occur during and after an upgrade.
  • Some gremlin_server properties in earlier versions of DSE are no longer required in DSE 6.0. If properties exist in the dse.yaml file after upgrading to DSE 6.0, logs display warnings similar to:
    WARN  [main] 2017-08-31 12:25:30,523 GREMLIN DseWebSocketChannelizer.java:149 - Configuration for the org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0 serializer in dse.yaml overrides the DSE default - typically it is best to allow DSE to configure these.
    You can ignore these warnings or modify dse.yaml so that only the required gremlin server properties are present.