Upgrading from DataStax Enterprise 5.1 to 6.7

Instructions for upgrading from DSE 5.1 to 6.7.

Upgrade order

Upgrade nodes in this order:
  • In multiple datacenter clusters, upgrade every node in one datacenter before upgrading another datacenter.
  • Upgrade the seed nodes within a datacenter first.
  • Upgrade nodes in this order:
    1. DSE Analytics datacenters
    2. Transactional/DSE Graph datacenters
    3. DSE Search datacenters

dse.yaml

The location of the dse.yaml file depends on the type of installation:
Package installations /etc/dse/dse.yaml
Tarball installations installation_location/resources/dse/conf/dse.yaml

logback.xml

The location of the logback.xml file depends on the type of installation:
Package installations /etc/dse/cassandra/logback.xml
Tarball installations installation_location/resources/cassandra/conf/logback.xml

DataStax Enterprise and Apache Cassandra™ configuration files

Configuration file Installer-Services and package installations Installer-No Services and tarball installations
DataStax Enterprise configuration files
byoh-env.sh /etc/dse/byoh-env.sh install_location/bin/byoh-env.sh
dse.yaml /etc/dse/dse.yaml install_location/resources/dse/conf/dse.yaml
logback.xml /etc/dse/cassandra/logback.xml install_location/resources/logback.xml
spark-env.sh /etc/dse/spark/spark-env.sh install_location/resources/spark/conf/spark-env.sh
spark-defaults.conf /etc/dse/spark/spark-defaults.conf install_location/resources/spark/conf/spark-defaults.conf
Cassandra configuration files
cassandra.yaml /etc/cassandra/cassandra.yaml install_location/conf/cassandra.yaml
cassandra.in.sh /usr/share/cassandra/cassandra.in.sh install_location/bin/cassandra.in.sh
cassandra-env.sh /etc/cassandra/cassandra-env.sh install_location/conf/cassandra-env.sh
cassandra-rackdc.properties /etc/cassandra/cassandra-rackdc.properties install_location/conf/cassandra-rackdc.properties
cassandra-topology.properties /etc/cassandra/cassandra-topology.properties install_location/conf/cassandra-topology.properties
jmxremote.password /etc/cassandra/jmxremote.password install_location/conf/jmxremote.password
Tomcat server configuration file
server.xml /etc/dse/resources/tomcat/conf/server.xml install_location/resources/tomcat/conf/server.xml
OpsCenter version DSE version
6.7 6.7, 6.0, 5.1
6.5 6.0, 5.1, 5.0 (EOL)
6.1 5.1, 5.0 (EOL), 4.8 (EOSL)
6.0 5.0 (EOL), 4.8 (EOSL), 4.7 (EOSL)

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations installation_location/resources/cassandra/conf/cassandra.yaml

DataStax driver changes

DataStax drivers come in two types:

  • DataStax drivers for DataStax Enterprise — for use by DSE 4.8 and later
  • DataStax drivers for Apache Cassandra™ — for use by Apache Cassandra™ and DSE 4.7 and earlier
Note: While the DataStax drivers for Apache Cassandra drivers can connect to DSE 5.0 and later clusters, DataStax strongly recommends upgrading to the DSE drivers. The DSE drivers provide functionality for all DataStax Enterprise features.

The upgrade process for DataStax Enterprise provides minimal downtime (ideally zero). During this process, upgrade and restart one node at a time while other nodes continue to operate online. With a few exceptions, the cluster continues to work as though it were on the earlier version of DataStax Enterprise until all of the nodes in the cluster are upgraded.

Follow these instructions to upgrade from DataStax Enterprise (DSE) 5.1 to DSE 6.7.

Review the DSE 6.0 and DSE 6.7 release notes for all changes.

Note: The DataStax Installer is not supported for DSE 6.0 and later. To upgrade from DSE 5.1 that was installed with the DataStax Installer, you must first change from a standalone installer installation to a tarball or package installation for the same DSE version. See Upgrading to DSE 6.0 or DSE 6.7 from DataStax Installer installations.

Always upgrade to latest patch release on your current version before you upgrade to a higher version. Fixes included in the latest patch release might help or smooth the upgrade process.

The latest 5.1.x version of DSE is 5.1.16.

Attention: Read and understand these instructions before upgrading. Carefully reviewing the planning and upgrade instructions can prevent errors and data loss.
Important: Support for Thrift-compatible tables (COMPACT STORAGE) is dropped. Before upgrading to DSE 6.7, all tables that have COMPACT STORAGE to CQL table format must be migrated. Use the ALTER TABLE DROP COMPACT STORAGE command to migrate Thrift-compatible tables to CQL table format. This command is available in DSE 5.1.6 or later.

General recommendations

DataStax recommends backing up your data prior to any version upgrade, including logs and custom configurations. A backup provides the ability to revert and restore all the data used in the previous version if necessary.

Tip: OpsCenter provides a Backup Service that manages enterprise-wide backup and restore operations for DataStax Enterprise clusters. OpsCenter 6.5 and later is recommended.

Upgrade restrictions and limitations

Restrictions and limitations apply while a cluster is in a partially upgraded state.

With these exceptions, the cluster continues to work as though it were on the earlier version of DataStax Enterprise until all of the nodes in the cluster are upgraded.

General restrictions and limitations during the upgrade process
  • Do not enable new features.
  • Do not run nodetool repair. If you have the OpsCenter Repair Service configured, turn off the Repair Service.
  • Ensure OpsCenter compatibility. OpsCenter 6.7 is required for managing DSE 6.7 clusters. See the compatibility table.
  • During the upgrade, do not bootstrap or decommission nodes.
  • Do not issue these types of CQL queries during a rolling restart: DDL and TRUNCATE.
  • NodeSync waits to start until all nodes are upgraded.
  • In DSE 5.1.0-5.1.6, the default number of threads used by performance objects is 1. For DSE 5.1.7 and later, the default number of threads used by performance objects is 4. During upgrade, compatible performance objects continue to work during the upgrade process. Incompatible performance objects that require schema changes will work in legacy mode or will start working after the upgrade is complete. Do not change the configuration of performance objects during upgrade. If performance objects were disabled before the upgrade, do not enable them during upgrade. See DSE Performance Service 6.7 | 5.1 | OpsCenter.
  • Failure to upgrade SSTables results in a significant performance impact and increased disk usage. Upgrading is not complete until the SSTables are upgraded.
Note: Nodes on different versions might show a schema disagreement during an upgrade.
Restrictions for DSE Advanced Replication nodes
Upgrades are supported only for DSE Advanced Replication V2.
Restrictions for DSE Analytic (Spark) nodes
  • Do not run analytics jobs until all nodes are upgraded.
  • All nodes in the cluster must be upgraded to the new version before Spark Worker and Spark Master will start.
DSE Graph nodes restrictions
Graph nodes have the same restrictions as the workload they run on. Do not alter graph schema during upgrades. Workload-specific restrictions apply for analytics and search nodes, such as no OLAP queries during upgrades.
DSE Search upgrade restrictions and limitations
  • Do not update schemas.
  • Do not reindex DSE Search nodes during upgrade.
  • DSE 6.7 uses a different Lucene codec than DSE 5.0. Segments written with this new codec cannot be read by earlier versions of DSE. To downgrade to earlier versions, the entire data directory for the search index in question must be cleared.
Restrictions for nodes using any kind of security
  • Do not change security credentials or permissions until the upgrade is complete on all nodes.
  • If you are not already using Kerberos, do not set up Kerberos authentication before upgrading. First upgrade the cluster, and then set up Kerberos.
Upgrading drivers and possible impact when driver versions are incompatible
Be sure to check driver compatibility. Depending on the driver version, you might need to recompile your client application code. See DataStax driver changes.
During upgrades, you might experience driver-specific impact when clusters have mixed versions of drivers. If your cluster has mixed versions, the protocol version is negotiated with the first host that the driver connects to. To avoid driver version incompatibility during upgrades, use one of these workarounds:
  • Protocol version: Because some drivers can use different protocol versions, force the protocol version at start up. For example, keep the Java driver at its current protocol version while the driver upgrade is happening. Switch to the Java driver to the new protocol version only after the upgrade is complete on all nodes in the cluster.
  • Initial contact points: Ensure that the list of initial contact points contains only hosts with the oldest driver version. For example, the initial contact points contain only Java driver v2.
For details on protocol version negotiation, see protocol versions with mixed clusters in the Java driver version you're using, for example, Java driver.

Preparing to upgrade

Follow these steps to prepare each node for upgrading from DSE 5.1 to DSE 6.7.
Note: These steps are performed in your current version and use DSE 5.1 documentation.
  1. Carefully review Planning your DataStax Enterprise upgrade.
  2. Replace ITriggers and custom interfaces.

    Several internal and beta extension points were modified to necessitate core storage engine refactoring. All custom implementations, including the following interfaces, must be replaced with supported implementations when upgrading to DSE 6.7. Because a rewrite of the following interfaces is required for DSE 6.7: (For help contact the DataStax Services team.)

    • The org.apache.cassandra.triggers.ITrigger interface was modified from augment to augmentNonBlocking for non-blocking internal architecture. Updated trigger implementations must be provided on upgraded nodes. If unsure, drop all existing triggers before upgrading. To check for existing triggers:
      SELECT * FROM system_schema.triggers
    • The org.apache.cassandra.index.Index interface was modified to comply with the core storage engine changes. Updated implementations are required. If unsure, drop all existing custom secondary indexes before upgrading, except DSE Search indexes, which do not need to be replaced. To check for existing indexes:
      SELECT * FROM system_schema.indexes
    • The org.apache.cassandra.cql3.QueryHandler, org.apache.cassandra.db.commitlog.CommitLogReadHandler, and other extension points have been changed. See QueryHandlers.
  3. Before upgrading, be sure that each node has ample free disk space.

    The required space depends on the compaction strategy. See Disk space.

  4. Familiarize yourself with the changes and features in this release:
  5. Verify that your current product version is DSE 5.1.
    dse -v
    These instructions are valid only for upgrades from DSE 5.1 to DSE 6.7.
  6. Upgrade to the latest patch release on your current version. The latest 5.1.x version of DSE is 5.1.16.

    Always upgrade to latest patch release on your current version before you upgrade to a higher version. Fixes included in the latest patch release might help or smooth the upgrade process.

  7. To prevent potential problems, upgrade the SSTables on each node to ensure that all SSTables are on the current version.
    nodetool upgradesstables

    If the SSTables are already on the current version, the command returns immediately and no action is taken.

  8. Back up the configuration files you use to a folder that is not in the directory where you normally run commands.

    The configuration files are overwritten with default values during installation of the new version.

  9. Verify the Java runtime version and upgrade to the recommended version.
    java -version
    Important: Although Oracle JRE/JDK 8 is supported, DataStax does more extensive testing on OpenJDK 8.
  10. Run nodetool repair to ensure that data on each replica is consistent with data on other nodes.
  11. Install the libaio package for optimal performance.
    RHEL platforms:
    sudo yum install libaio
    Debian:
    sudo apt-get install libaio1
  12. DSE Analytics nodes:
    1. Support for Thrift-compatible tables (COMPACT STORAGE) is dropped. Before upgrading, follow the steps to migrate all tables that have COMPACT STORAGE to CQL table format while DSE 5.x.x is running.
      Note: Do not migrate system.* tables, COMPACT STORAGE is removed by DSE internally. Modifying the system keyspace is not supported; modification attempts generate an error.
      For DSE Analytics, drop compact storage from all the tables in the "HiveMetaStore" and PortfolioDemo keyspaces.
      After COMPACT STORAGE is dropped, columns to support migration to CQL-compatible table format are added as described in migrating from compact storage.
      Attention: DSE 6.0 will not start if COMPACT STORAGE tables are present. Creating a COMPACT STORAGE table in a mixed-version cluster is not supported. Driver connections to the latest DSE 5.0.x and DSE 5.1.x run in a "NO_COMPACT" mode that causes compact tables to appear as if the compact flags were already dropped, but only for the current session.
    2. If you programmatically set the shuffle parameter, you must change the code for applications that use conf.set("spark.shuffle.service.port", port). Instead, use dse spark-submit which automatically sets the correct service port based on the authentication state. See Configuring Spark for more information.
    3. If DSEFS is enabled, copy CFS hivemetastore directory to dse:
      DSE_HOME/bin/dse hadoop fs -cp cfs://127.0.0.1/user/spark/warehouse/ dsefs://127.0.0.1/user/spark/warehouse/
      After upgrade is complete migrate Spark SQL tables (if used) to the new Hive metastore format:
      dse client-tool spark metastore migrate --from 5.0.0 --to 6.0.0
    4. Cassandra File System (CFS) is removed. Remove the cfs and cfs_archive keyspaces before upgrading. See the From CFS to DSEFS blog post and the Copying data from CFS to DSEFS documentation for more information.
    5. Make sure any use of the SPARK_LOCAL_DIRS and SPARK_EXECUTOR_DIRS environment variables match their use as described in Setting environment variables.
    6. For applications to use the compatible Spark Jobserver API in the DataStax repository, migrate jobs that extend from SparkHiveJob and SparkSqlJob to SparkSessionJob. See example in the DemoSparkSessionJob in the demos directory.
      Note: Spark Jobserver is the DSE custom version 8.0.4.45.
      The default location of the demos directory depends on the type of installation:
      • Package installations: /usr/share/dse/demos
      • Tarball installations: installation_location/demos
  13. DSE Search nodes:
    • Ensure all use of HTTP writes are changed to use CQL commands for updates and inserts.
    • Edit the search index config and make these changes, as needed. See Search index config for valid options to change query behavior for search indexes.
      • Remove the unsupported dataDir option. You can still set the location of search indexes.
      • Remove mergePolicy, maxMergeDocs, and mergeFactor. For example:
        <mergeFactor>25</mergeFactor>
        <maxMergeDocs>...
        <mergePolicy>...
        Use mergePolicyFactory instead, and add mergeScheduler:
        <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
            <int name="maxThreadCount">16</int>
            <int name="maxMergeCount">32</int>
        </mergeScheduler>
        ...
        <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
          <int name="maxMergeAtOnce">10</int>
          <int name="segmentsPerTier">10</int>
        </mergePolicyFactory>
      • Remove any instance of ExtractingRequestHandler.
      • Remove DSENRTCachingDirectoryFactory. Change:
        <directoryFactory name="DirectoryFactory" class="com.datastax.bdp.search.solr.DSENRTCachingDirectoryFactory"/>
        to:
        <directoryFactory name="DirectoryFactory" class="solr.StandardDirectoryFactory"/>
    • Ensure that the catalina.properties and context.xml files are present in the Tomcat conf dir. DSE will not start after upgrade if these files are missing.
      The default location of the Tomcat conf directory depends on the type of installation:
      • Package installations: /etc/dse/tomcat/conf
      • Tarball installations: installation_location/resources/tomcat/conf
    • If earlier DSE versions use a custom configuration for the Solr UI web.xml, change:
      <filter-class>com.datastax.bdp.search.solr.auth.DseAuthenticationFilter</filter-class>
      to
      <filter-class>com.datastax.bdp.cassandra.auth.http.DseAuthenticationFilter</filter-class>
    • StallMetrics MBean is removed. Change operators that use the MBean.
  14. DSE Graph nodes:
    • Ensure that edge label names and property key names use only the supported characters. Edge label names and property key names allow only [a-zA-Z0-9], underscore, hyphen, and period. In earlier versions, edge label names and property key names allowed nearly unrestricted Unicode.
      • schema.describe() displays the entire schema, even if it contains illegal names.
      • In-place upgrades allow existing schemas with invalid edge label names and property key names.
      • Schema elements with illegal names cannot be updated or added.

Upgrade steps

To upgrade from DSE 5.1 to DSE 6.7, follow these steps on each node in the recommended order. The upgrade process requires upgrading and restarting one node at a time.
Note: These steps are performed in your upgraded version and use DSE 6.7 documentation.
  1. To flush the commit log of the old installation:
    nodetool -h hostname drain
    This step saves time when nodes start up after the upgrade and prevents DSE Search nodes from having to reindex data.
    Important: This step is mandatory when upgrading between major Cassandra versions that change SSTable formats, rendering commit logs from the previous version incompatible with the new version.
  2. Stop the node. See Stopping a DataStax Enterprise node.
    • To stop DataStax Enterprise running as a service:
      sudo service dse stop
    • To stop DataStax Enterprise running as a stand-alone process:
      bin/dse cassandra-stop
  3. Use the appropriate method to install the new product version on a supported platform:
    Note: Install the new product version using the same installation type that is on the system, otherwise problems might result.
  4. To configure the new version:
    1. Compare your backup configuration files to the new configuration files:
      • Look for any deprecated, removed, or changed settings in cassandra.yaml and dse.yaml.

        After the upgrade and before restarting with DSE 6.7.0, remove deprecated settings and use new settings.

        cassandra.yaml changes

        Memtable settings
        Deprecated cassandra.yaml settings
        memtable_heap_space_in_mb
        memtable_offheap_space_in_mb
        Replace with this setting
        memtable_space_in_mb

        Governs heap and offheap space allocation to set a threshold for automatic memtable flush. The calculated default is 1/4 of the heap size.

        Changed setting
        memtable_allocation_type: offheap_objects

        The default method the database uses to allocate and manage memtable memory is  offheap_objects.

        User-defined functions (UDF) settings
        Deprecated cassandra.yaml settings
        user_defined_function_warn_timeout
        user_defined_function_fail_timeout
        Replace with these settings
        user_defined_function_warn_micros: 500
        user_defined_function_fail_micros: 10000
        user_defined_function_warn_heap_mb: 200
        user_defined_function_fail_heap_mb: 500
        user_function_timeout_policy: die

        Settings are in microseconds since Java UDFs run faster. The new timeouts are not equivalent to the deprecated settings.

        Internode encryption settings
        Deprecated cassandra.yaml setting
        server_encryption_options:
            store_type: JKS
        Replace with these settings
        server_encryption_options:
            keystore_type: JKS
            truststore_type: JKS

        Valid type options are JKS, JCEKS, PKCS12, or PKCS11.

        Client-to-node encryption settings
        Deprecated cassandra.yaml setting
        client_encryption_options:
            store_type: JKS
        Replace with these settings
        client_encryption_options:
            keystore_type: JKS
            truststore_type: JKS

        Valid type options are JKS, JCEKS, PKCS12, or PKCS11.

        Credentials cache
        Deprecated cassandra.yaml settings
        credentials_validity_in_ms
        credentials_update_interval_in_ms
        Comment out or remove these credentials cache settings, if they exist. Caches are optimized without these settings.

        dse.yaml changes

        Spark resource and encryption options
        Deprecated dse.yaml setting
        spark_ui_options:
            server_encryption_options:
            store_type: JKS
        Replace with these settings
        spark_ui_options:
            server_encryption_options:
            keystore_type: JKS
            truststore_type: JKS

        Valid options are JKS, JCEKS, PKCS12, or PKCS11.

        DSE Search nodes
        Deprecated dse.yaml settings
        Remove these options:
        cql_solr_query_executor_threads
        enable_back_pressure_adaptive_nrt_commit
        max_solr_concurrency_per_core
        solr_indexing_error_log_options 
        DSE 6.7 will not start with these options present.
    2. Merge the applicable modifications into the new version.
    3. Ensure that keyspace replication factors are correct for your environment:
  5. When upgrading DSE to versions earlier than 5.1.16, 6.0.8, or 6.7.4 inclusive, if any tables are using DSE Tiered Storage, remove all txn_compaction log files from second-level tiers and lower. For example, given the following dse.yaml configuration, remove txn_compaction log files from /mnt2 and /mnt3 directories:
    tiered_storage_options:
        strategy1:
            tiers:
                - paths:
                    - /mnt1
                - paths:
                    - /mnt2
                - paths:
                    - /mnt3

    The following example removes the files using the find command:

    find /mnt2 -name "*_txn_compaction_*.log" -type f -delete &&
    find /mnt3 -name "*_txn_compaction_*.log" -type f -delete
    Warning: Failure to complete this step may result in data loss.
  6. Start the node.
  7. Verify that the upgraded datacenter names match the datacenter names in the keyspace schema definition:
    nodetool status
  8. Review the logs for warnings, errors, and exceptions.

    Warnings, errors, and exceptions are frequently found in the logs when starting an upgraded node. Some of these log entries are informational to help you execute specific upgrade-related steps. If you find unexpected warnings, errors, or exceptions, contact DataStax Support.

  9. Repeat the upgrade on each node in the cluster following the recommended order.

    Upgrading and restarting each node is called a rolling restart.

Recovery after upgrading to DSE 6.7 without dropping compact storage

Support for Thrift-compatible tables (Compact Storage) is dropped. All tables using Compact Storage must be dropped or migrated to CQL table format before upgrading to DSE 6.7. If a cluster has been upgraded to DSE 6.7 and any Compact Storage tables still exist, follow this procedure to recover and proceed with the upgrade:
  1. Downgrade any nodes which were already upgraded to DSE 6.7 to the latest version in the DSE 5.0 or 5.1 series:
    • DSE 5.0.x, downgrade to 5.0.15 or later
    • DSE 5.1.x, downgrade to 5.1.11 or later
  2. On each node that was attempted to be started on DSE 6.7, start DSE with the -Dcassandra.commitlog.ignorereplayerrors=true option.
  3. On one node (any node) in the cluster, DROP COMPACT STORAGE from tables which use it.
  4. Restart DSE to continue the upgrade to DSE 6.7.

After the upgrade

After all nodes are upgraded and running on DSE 6.7:

  1. If you use the OpsCenter Repair Service, turn on the Repair Service.
  2. Remove any previously installed JTS JAR files from the classpaths in your DSE installation. JTS (Java Topology Suite) is distributed with DSE 6.7.
  3. After all nodes are on DSE 6.7 and the required schema change occurs, the new authorization with CassandraAuthorizer enables the use of new columns.
  4. DSE 6.7 introduces, and enables by default, the DSE Metrics Collector, a diagnostics information aggregator used to help facilitate DSE problem resolution. For more information on the DSE Metrics Collector, see DataStax Enterprise Metrics Collector.
  5. DSE Search only:
    • The appender SolrValidationErrorAppender and the logger SolrValidationErrorLogger are no longer used and may safely be removed from logback.xml.
    • In contrast to earlier versions, DataStax recommends accepting the new default value of 1024 for back_pressure_threshold_per_core in dse.yaml. See Configuring and tuning indexing performance.
    • Slow startup on nodes with large encrypted indexes is resolved. However, action is required to realize the performance gains. You must do a full reindex of all encrypted search indexes on each node in your cluster. Plan sufficient time after the upgrade is complete to reindex with deleteAll=true in a rolling fashion. For example:
      dsetool reload_core keyspace_name.table_name distributed=false reindex=true deleteAll=true 
  6. DSE Analytics only:
    • Check the replication factor for the dse_analytics keyspace, a new keyspace stores all DSE Analytics internal system data. DataStax recommends setting the replication strategy to NetworkTopologyStrategy (NTS) with a replication factor of at least 3 in each of DSE Analytics datacenters. If a datacenter has more nodes, a larger replication factor should be considered.
    • Spark Jobserver uses DSE custom version 0.8.0.45. Applications must use the compatible Spark Jobserver API from the DataStax repository.
  7. DSE 6.7 introduces, and enables by default, the DSE Metrics Collector, a diagnostics information aggregator used to help facilitate DSE problem resolution. For more information on the DSE Metrics Collector, see DataStax Enterprise Metrics Collector.

Warning messages during and after upgrade

Error messages provide information to help identify problems. You can ignore some log messages that occur during and after an upgrade.
  • Some gremlin_server properties in earlier versions of DSE are no longer required in DSE 6.7. If properties exist in the dse.yaml file after upgrading to DSE 6.7, logs display warnings similar to:
    WARN  [main] 2017-08-31 12:25:30,523 GREMLIN DseWebSocketChannelizer.java:149 - Configuration for the org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0 serializer in dse.yaml overrides the DSE default - typically it is best to allow DSE to configure these.
    You can ignore these warnings or modify dse.yaml so that only the required gremlin server properties are present.