Upgrading from DataStax Enterprise 5.0 to 6.7

Instructions for upgrading from DSE 5.0 to 6.7.

OpsCenter version DSE version
6.7 6.7, 6.0, 5.1
6.5 6.0, 5.1, 5.0
6.1 5.1, 5.0
6.0 5.0

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations installation_location/resources/cassandra/conf/cassandra.yaml

logback.xml

The location of the logback.xml file depends on the type of installation:
Package installations /etc/dse/cassandra/logback.xml
Tarball installations installation_location/resources/cassandra/conf/logback.xml

server.xml

The default location of the Tomcat server.xml file depends on the installation type:
Package installations /etc/dse/tomcat/conf/server.xml
Tarball installations installation_location/resources/tomcat/conf/server.xml

DataStax Enterprise and Apache Cassandra™ configuration files

Configuration file Installer-Services and package installations Installer-No Services and tarball installations
DataStax Enterprise configuration files
byoh-env.sh /etc/dse/byoh-env.sh install_location/bin/byoh-env.sh
dse.yaml /etc/dse/dse.yaml install_location/resources/dse/conf/dse.yaml
logback.xml /etc/dse/cassandra/logback.xml install_location/resources/logback.xml
spark-env.sh /etc/dse/spark/spark-env.sh install_location/resources/spark/conf/spark-env.sh
spark-defaults.conf /etc/dse/spark/spark-defaults.conf install_location/resources/spark/conf/spark-defaults.conf
Cassandra configuration files
cassandra.yaml /etc/cassandra/cassandra.yaml install_location/conf/cassandra.yaml
cassandra.in.sh /usr/share/cassandra/cassandra.in.sh install_location/bin/cassandra.in.sh
cassandra-env.sh /etc/cassandra/cassandra-env.sh install_location/conf/cassandra-env.sh
cassandra-rackdc.properties /etc/cassandra/cassandra-rackdc.properties install_location/conf/cassandra-rackdc.properties
cassandra-topology.properties /etc/cassandra/cassandra-topology.properties install_location/conf/cassandra-topology.properties
jmxremote.password /etc/cassandra/jmxremote.password install_location/conf/jmxremote.password
Tomcat server configuration file
server.xml /etc/dse/resources/tomcat/conf/server.xml install_location/resources/tomcat/conf/server.xml

DataStax driver changes

DataStax drivers come in two types:

  • DataStax drivers for DataStax Enterprise — for use by DSE 4.8 and later
  • DataStax drivers for Apache Cassandra™ — for use by Apache Cassandra™ and DSE 4.7 and earlier
Note: While the DataStax drivers for Apache Cassandra drivers can connect to DSE 5.0 and later clusters, DataStax strongly recommends upgrading to the DSE drivers. The DSE drivers provide functionality for all DataStax Enterprise features.
Table 1. Information for upgrading is included in each driver guide
DataStax drivers for DataStax Enterprise DataStax drivers for Apache Cassandra
C/C++ C/C++
C# C#
Java Java
Node.js Node.js
Python Python
Maintenance mode drivers
Supported by DataStax, but only critical bug fixes will be included in new versions.
PHP PHP
Ruby Ruby
Additional driver documentation
All Drivers Version compatibiliy

Upgrade order

Upgrade nodes in this order:
  • In multiple datacenter clusters, upgrade every node in one datacenter before upgrading another datacenter.
  • Upgrade the seed nodes within a datacenter first.
  • Upgrade nodes in this order:
    1. DSE Analytics datacenters
    2. Transactional/DSE Graph datacenters
    3. DSE Search datacenters

dse.yaml

The location of the dse.yaml file depends on the type of installation:
Package installations /etc/dse/dse.yaml
Tarball installations installation_location/resources/dse/conf/dse.yaml

Upgrading major Cassandra version

Upgrading SSTables is required for upgrades that contain major Apache Cassandra releases:
  • DataStax Enterprise 6.7 is compatible with Cassandra 3.11.
  • DataStax Enterprise 6.0 is compatible with Cassandra 3.11.
  • DataStax Enterprise 5.1 uses Cassandra 3.11.
  • DataStax Enterprise 5.0 uses Cassandra 3.0.
  • DataStax Enterprise 4.7 to 4.8 use Cassandra 2.1.
  • DataStax Enterprise 4.0 to 4.6 use Cassandra 2.0.

The upgrade process for DataStax Enterprise provides minimal downtime (ideally zero). During this process, upgrade and restart one node at a time while other nodes continue to operate online. With a few exceptions, the cluster continues to work as though it were on the earlier version of DataStax Enterprise until all of the nodes in the cluster are upgraded.

Follow these instructions to upgrade from DSE 5.0 to 6.7. If you have an earlier version of DSE, upgrade to the latest version of 5.0 before continuing.

Review the DSE 5.1, DSE 6.0, and DSE 6.7 release notes for all changes.

Note: The DataStax Installer is not supported for DSE 6.0 and later. To upgrade from DSE 5.0 that was installed with the DataStax Installer, you must first change from a standalone installer installation to a tarball or package installation for the same DSE version. See Upgrading to DSE 6.0 or DSE 6.7 from DataStax Installer installations.

Always upgrade to latest patch release on your current version before you upgrade to a higher version. Fixes included in the latest patch release might help or smooth the upgrade process.

The latest version of DSE 5.0 is 5.0.15.

Attention: Read and understand these instructions before upgrading. Carefully reviewing the planning and upgrading instructions can ensure a smooth upgrade and avoid pitfalls and frustrations.
Important: Support for Thrift-compatible tables (COMPACT STORAGE) is dropped. Before upgrading to DSE 6.7, all tables that have COMPACT STORAGE to CQL table format must be migrated. Use the ALTER TABLE DROP COMPACT STORAGE command to migrate Thrift-compatible tables to CQL table format. This command is available in DSE 5.0.12 or later.

Apache Cassandra™ version change

Upgrading from DataStax Enterprise 5.0 to 6.7 includes a major Cassandra version change.
  • DataStax Enterprise 6.7 is compatible with Cassandra 3.11.
  • DataStax Enterprise 6.0 is compatible with Cassandra 3.11.
  • DataStax Enterprise 5.1 uses Cassandra 3.11.
  • DataStax Enterprise 5.0 uses Cassandra 3.0.
  • DataStax Enterprise 4.7 to 4.8 use Cassandra 2.1.
  • DataStax Enterprise 4.0 to 4.6 use Cassandra 2.0.
Be sure to follow the recommendations for upgrading the SSTables.

General recommendations

DataStax recommends backing up your data prior to any version upgrade, including logs and custom configurations. A backup provides the ability to revert and restore all the data used in the previous version if necessary.

OpsCenter provides a Backup service that manages enterprise-wide backup and restore operations for DataStax Enterprise clusters. OpsCenter 6.5 and later is recommended.

Upgrade restrictions and limitations

Restrictions and limitations apply while a cluster is in a partially upgraded state.

With these exceptions, the cluster continues to work as though it were on the earlier version of DataStax Enterprise until all of the nodes in the cluster are upgraded.

General restrictions and limitations during the upgrade process
  • Do not enable new features.
  • Do not run nodetool repair. If you have the OpsCenter Repair Service configured, turn off the Repair Service.
  • Ensure OpsCenter compatibility. OpsCenter 6.7 is required for managing DSE 6.7 clusters. See the compatibility table.
  • During the upgrade, do not bootstrap or decommission nodes.
  • Do not issue these types of CQL queries during a rolling restart: DDL and TRUNCATE.
  • NodeSync waits to start until all nodes are upgraded.
  • In DSE 5.0, the default number of threads used by performance objects is 1. For DSE 6.7, the default number of threads used by performance objects is 4. During upgrade, compatible performance objects continue to work during the upgrade process. Incompatible performance objects that require schema changes will work in legacy mode or will start working after the upgrade is complete. Do not change the configuration of performance objects during upgrade. If performance objects were disabled before the upgrade, do not enable them during upgrade. See DSE Performance Service 6.7 | 5.0 | OpsCenter.
  • Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage. Upgrading is not complete until the SSTables are upgraded.
Note: Nodes on different versions might show a schema disagreement during an upgrade.
Restrictions for DSE Advanced Replication nodes
Upgrades are supported only for DSE Advanced Replication V2.
DSE Search upgrade restrictions and limitations
  • Do not update schemas.
  • Do not reindex DSE Search nodes during upgrade.
  • DSE 6.7 uses a different Lucene codec than DSE 5.0. Segments written with this new codec cannot be read by earlier versions of DSE. To downgrade to earlier versions, the entire data directory for the search index in question must be cleared.
  • DSE Search in DataStax Enterprise 6.7 uses Apache Solr 6.0. This significant change requires advanced planning and specific actions before and after the upgrade.
Important: Before you upgrade DSE Search or SearchAnalytics workloads, you must follow the specific tasks in the advanced preparation for upgrading DSE Search and SearchAnalytics nodes section.
Restrictions for nodes using any kind of security
  • Do not change security credentials or permissions until the upgrade is complete on all nodes.
  • If you are not already using Kerberos, do not set up Kerberos authentication before upgrading. First upgrade the cluster, and then set up Kerberos.
Upgrading drivers and possible impact when driver versions are incompatible
Be sure to check driver compatibility. Depending on the driver version, you might need to recompile your client application code. See DataStax driver changes.
During upgrades, you might experience driver-specific impact when clusters have mixed versions of drivers. If your cluster has mixed versions, the protocol version is negotiated with the first host that the driver connects to. To avoid driver version incompatibility during upgrades, use one of these workarounds:
  • Protocol version: Because some drivers can use different protocol versions, force the protocol version at start up. For example, keep the Java driver at its current protocol version while the driver upgrade is happening. Switch to the Java driver to the new protocol version only after the upgrade is complete on all nodes in the cluster.
  • Initial contact points: Ensure that the list of initial contact points contains only hosts with the oldest driver version. For example, the initial contact points contain only Java driver v2.
For details on protocol version negotiation, see protocol versions with mixed clusters in the Java driver version you're using, for example, Java driver.

Advanced preparation for upgrading DSE Search and SearchAnalytics nodes

Before starting the Preparing to upgrade steps, complete all the advanced preparation steps on DSE Search and SearchAnalytics nodes while DSE 5.0 is still running.

Plan sufficient time to implement and test the required changes before the upgrade:
  • Schema changes require a full reindex.
  • Configuration changes require reloading the core.
  1. Change HTTP queries to CQL:
    • Delete-by-id is removed, use CQL DELETE by primary key instead.
    • Delete-by-query no longer supports wildcards, use CQL TRUNCATE instead.
  2. If any Solr core was created on DSE 4.6 or earlier and never reindexed after being upgraded to DSE 4.7 or later, you must reindex on DSE 5.0 before upgrading to DSE 6.7.
  3. Ensure that the shard_transport_options in dse.yaml are set only for netty_client_request_timeout:
    shard_transport_options:
        netty_client_request_timeout: 60000
    In DSE 6.7, the shard transport option supports only netty_client_request_timeout. Remove any other shard_transport_options.
  4. If you are using Apache Solr SolrJ, the minimum required version is 6.0.0.
  5. For SpatialRecursivePrefixTreeFieldType (RPT) in search schemas, you must adjust your queries for these changes:
    • IsDisjointTo is no longer supported in queries on SpatialRecursivePrefixTreeFieldType. Replace IsDisjointTo with a NOT Intersects query. For example:
      foo:0,0 TO 1000,1000 AND -"Intersects(POLYGON((338 211, 338 305, 404 305, 404 211, 338 211)))")
    • The ENVELOPE syntax is now required for WKT-style queries against SpatialRecursivePrefixTreeFieldType fields. You must specify ENVELOPE(10, 15, 15, 10), where queries on earlier releases could specify 10 10 15 15.
    See Spatial Search for details on using distanceUnits in spatial queries.
  6. Edit the solrconfig.xml file and make these changes, as needed:
    • Remove these unsupported Solr requestHandlers:
      • XmlUpdateRequestHandler
      • BinaryUpdateRequestHandler
      • CSVRequestHandler
      • JsonUpdateRequestHandler
      • DataImportHandler

      For example:

      <requestHandler name="/dataimport" class="solr.DataImportHandler"/>

      or

      <requestHandler name="/update" class="solr.XmlUpdateRequestHandler"/>
    • Change the directoryFactory from:
      <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
      to
      <directoryFactory name="DirectoryFactory" class="solr.StandardDirectoryFactory"/>
    • <unlockOnStartup> is unsupported as a result of LUCENE-6508 and SOLR-7942.
    • Change the updateLog from:
      <updateLog class="solr.FSUpdateLog" force="false">
      to
      <updateLog force="false">
  7. Upgrading DSE search nodes requires replacing unsupported Solr types with supported types.
    Note: Special handling is also required for BCDStrField, addressed in step 9.
    Sorting limitations apply to mixed version clusters. Some of the removed Solr types, due to the way they marshal sort values during distributed queries (combined with the way the suggested new types unmarshal sort values), cannot be sorted on during rolling upgrades when some nodes use an unsupported type and other nodes use the suggested new type. The following type transitions are problematic:
    Removed Solr field types Supported Solr field types
    ByteField TrieIntField
    DateField TrieDateField
    BCDIntField TrieIntField
    BCDLongField TrieLongField
    Two options are available:
    1. Avoid sorting on removed Solr field types until the upgrade is complete for all nodes in the datacenter being queried.
      Tip: When using two search datacenters, isolate queries to a single datacenter and then change the schema and reindex the other datacenter. Then isolate queries to the newly reindexed datacenter while you change the schema and upgrade the first datacenter.
    2. If you are using BCDIntField or BCDLongField, update the schema to replace BCDIntField and BCDLongField with types that are sort-compatible with the supported Solr types TrieIntField and TrieLongField:
      Removed Solr field types Interim sort-compatible supported Solr field types
      BCDIntField SortableIntField
      BCDLongField SortableLongField
      Change the schema in a distributed fashion, and do not reindex. After the schema is updated on all nodes, then go on to 9.
  8. Update the schema and configuration for the Solr field types that are removed from Solr 5.5 and later.
    • Update the schema to replace unsupported Solr field types with supported Solr field types:
      Removed Solr field types Supported Solr field types
      ByteField TrieIntField
      DateField TrieDateField
      DoubleField TrieDoubleField
      FloatField TrieFloatField
      IntField TrieIntField
      LongField TrieLongField
      ShortField TrieIntField
      SortableDoubleField TrieDoubleField
      SortableFloatField TrieFloatField
      SortableIntField TrieIntField
      SortableLongField TrieLongField
      BCDIntField TrieIntField
      BCDLongField TrieLongField
      BCDStrField (see 10 if used) TrieIntField
    • If you are using type mapping version 0, or you do not specify a type mapper, verify or update the solrconfig.xml to use dseTypeMappingVersion 1:
      <dseTypeMappingVersion>1</dseTypeMappingVersion>
      If the Solr core is backed by a CQL table and the type mapping is unspecified, use type mapping version 2.
    • Reload the core:
      dsetool reload_core keyspace_name.table_name schema=filepath solrconfig=filepath
      If you were using the unsupported data types, do a full reindex node-by-node:
      dsetool reload_core keyspace_name.table_name schema=filepath solrconfig=filepath reindex=true deleteAll=true distributed=false
    Note: In DSE 5.1 and later, auto generated schemas use data type mapper 2.
  9. If using BCDStrField: In DSE 5.0 and earlier, DSE mapped Cassandra text columns to BCDStrField. The deprecated BCDStrField is removed.
    The recommended strategy is to upgrade the data type to TrieIntField. However, DSE cannot map text directly to TrieIntField. If you are using BCDStrField, you must complete one of these options before the upgrade:
    • If BCDStrField is no longer used, remove the BCDStrField field from the Solr schema. Reindexing is not required.
    • If you want to index the field as a TrieIntField, and a full reindex is acceptable, change the underlying database column to use the type int.
    • If you want to keep the database column as text and you still want to do simple matching queries on the indexed field, switch from BCDStrField to StrField in the schema. Indexing should not be required, but the field will no longer be appropriate for numeric range queries or sorting because StrField uses a lexicographic order, not a numeric one.
    • Not recommended: If you want to keep the database column as text and still want to perform numeric range queries and sorts on the former BCDStrField, but would rather change their application than perform a full reindex:
      • Change the field to StrField in the Solr schema with indexed=false.
      • Add a new copy field with the type TrieIntField that has its values supplied by the original BCDStrField.
      This solution still requires reindex to work, because the copy field target must be populated. This non-recommended option is supplied only to support a sub-optimal data model; for example, a text column with values that would fit only into an int.
    After you make these schema changes, do a rolling, node-by-node reload_core with reindex=true, distributed=false, and deleteAll=true.
    Note: If you have two datacenters and upgrade them one at a time, reload the core with distributed=true and deleteAll=true.
  10. Tune the schema before you upgrade. After the upgrade, all field definitions in the schema are validated and must be DSE Search compatible, even if the fields are not indexed, have docValues applied, or used for copy-field source. The default behavior of automatic resource generation includes all columns. To improve performance, take action to prevent the fields from being loaded from the database. Include only the required fields in the schema by removing or commenting out unused fields in the schema.

Advanced preparation for upgrading DSE Graph nodes with search indexes

These steps apply to graph nodes that have search indexes.

Before starting the Preparing to upgrade steps, complete these advanced preparation steps while DSE 5.0 is still running.

Note: Upgrading DSE Graph nodes with search indexes requires these edits to the solrconfig file. Configuration changes require reloading the core. Plan sufficient time to implement and test changes that are required before the upgrade.
Edit the solrconfig.xml file and make these changes, as needed:
  • Remove these requestHandlers: XmlUpdateRequestHandler, BinaryUpdateRequestHandler, CSVRequestHandler, JsonUpdateRequestHandler, and DataImportHandler. Solr deprecated and then removed these requestHandlers.

    For example:

    <requestHandler name="/dataimport" class="solr.DataImportHandler"/>

    or

    <requestHandler name="/update" class="solr.XmlUpdateRequestHandler">/>
  • <unlockOnStartup> is now unsupported as a result of LUCENE-6508 and SOLR-7942.
  • Reload the core so that this configuration change is respected:
    dsetool reload_core keyspace_name.table_name reindex=false

Advanced preparation for upgrading DSE Analytics nodes

Upgrades from DSE 5.0 to DSE 6.7 include a major upgrade to Spark 2.2 with a tighter integration between DSE and Spark. For information on Spark 2.2, see the Spark documentation. Spark 2.2 uses Scala 2.11.

A new Spark resource manager uses the DataStax Java Driver and native CQL protocol for managing communication between DSE Spark nodes. DSE 5.0 used Spark RPC. This resource manage change impacts Spark applications that ran on DSE 5.0.
  1. Spark 2.2 uses Scala 2.11. You must recompile all DSE 5.0 Scala Spark applications against Scala 2.11 and use only Scala 2.11 third-party libraries.

    Changing the dse-spark-dependencies in your build files is not sufficient to change the compilation target. See the example projects for how to set up your build files.

  2. Spark applications should use dse:// URLs instead of spark://spark_master_IP:Spark_RPC_port_number URLs, as described in Specifying Spark URLs.

    You no longer need to specify the Spark master IP address or hostname when using dse:// URLs. Connecting to any Spark node will redirect the request to the master node.

  3. If you have existing Spark application code that uses spark://Spark master IP:Spark RPC port to connect, it will no longer work.

    For example, the following code worked in DSE 5.0 but will not work in DSE 5.1 or later.

    val conf = new SparkConf(true) 
    .setMaster("spark://192.168.123.10:7077") 
    .setAppName("cassandra-demo") 
    .set("cassandra.connection.host" , "192.168.123.10") // initial contact 
    .set("cassandra.username", "cassandra") 
    .set("cassandra.password", "cassandra") 
    val sc = new SparkContext(conf)

    To connect to DSE 6.7, you no longer need to call setMaster:

    val conf = new SparkConf(true)
    .setAppName("cassandra-demo") 
    .set("cassandra.connection.host" , "192.168.123.10") // initial contact 
    .set("cassandra.username", "cassandra") 
    .set("cassandra.password", "cassandra") 
    val sc = new SparkContext(conf)

    To specify the master using setMaster, use the dse:// URL format.

  4. You can restrict Spark jobs to specific database roles. See Managing Spark application permissions.
  5. You can set the Spark executor process owners, as described in Running Spark processes as separate users.
  6. The user submitting the Spark application no longer has to be the same database role. See Specifying Spark URLs to change the master connection submission to use a different user or cluster than the database connection.
Backing up DSEFS data

These steps only apply to nodes that use DSEFS. Before starting the Preparing to upgrade steps, complete these advanced preparation steps.

The DSEFS schema used by the database is improved, but the old schema is still supported and will not be modified during the upgrade. To use the new DSEFS schema with existing DSEFS data, the current DSEFS data to local storage using dse hadoop fs -cp command:

dse hadoop fs -cp /* /local_backup_location

Preparing to upgrade

Follow these steps to prepare each node for upgrading from DSE 5.0 to DSE 6.7.
Note: These steps are performed in your current version and use DSE 5.0 documentation.
  1. Upgrade to the latest patch release on your current version. Fixes included in the latest patch release can simplify the upgrade process.
  2. Before upgrading, be sure that each node has ample free disk space.

    The required space depends on the compaction strategy. See Disk space.

  3. Familiarize yourself with the changes and features in this release:
  4. Replace ITriggers and custom interfaces.

    Several internal and beta extension points were modified to necessitate core storage engine refactoring. All custom implementations, including the following interfaces, must be replaced with supported implementations when upgrading to DSE 6.7. Because a rewrite of the following interfaces is required for DSE 6.7: (For help contact the DataStax Services team.)

    • The org.apache.cassandra.triggers.ITrigger interface was modified from augment to augmentNonBlocking for non-blocking internal architecture. Updated trigger implementations must be provided on upgraded nodes. If unsure, drop all existing triggers before upgrading. To check for existing triggers:
      SELECT * FROM system_schema.triggers
    • The org.apache.cassandra.index.Index interface was modified to comply with the core storage engine changes. Updated implementations are required. If unsure, drop all existing custom secondary indexes before upgrading, except DSE Search indexes, which do not need to be replaced. To check for existing indexes:
      SELECT * FROM system_schema.indexes
    • The org.apache.cassandra.cql3.QueryHandler, org.apache.cassandra.db.commitlog.CommitLogReadHandler, and other extension points have been changed. See QueryHandlers.
  5. Support for Thrift-compatible tables (COMPACT STORAGE) is dropped. Before upgrading, follow the steps to migrate all tables that have COMPACT STORAGE to CQL table format while DSE 5.0 is running.
    Note: Do not migrate system.* tables, COMPACT STORAGE is removed by DSE internally.
    For DSE Analytics, drop compact storage from all the tables in the "HiveMetaStore" and PortfolioDemo keyspaces.
    After COMPACT STORAGE is dropped, columns to support migration to CQL-compatible table format are added as described in migrating from compact storage.
    Attention: DSE 6.7 will not start if COMPACT STORAGE tables are present. Creating a COMPACT STORAGE table in a mixed-version cluster is not supported.
  6. If audit logging is configured to use CassandraAuditWriter, run these commands as superuser on DSE 5.0 nodes, and then ensure that the entire cluster has schema agreement:
    ALTER TABLE dse_audit.audit_log ADD authenticated text;
    ALTER TABLE dse_audit.audit_log ADD consistency text;
  7. Upgrade the SSTables on each node to ensure that all SSTables are on the current version.
    nodetool upgradesstables
    This step is required for DataStax Enterprise upgrades that include a major Cassandra version changes.
    Warning: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage.

    If the SSTables are already on the current version, the command returns immediately and no action is taken.

  8. Verify the Java runtime version and upgrade to the recommended version.
    java -version
    Important: Although Oracle JRE/JDK 8 is supported, DataStax does more extensive testing on OpenJDK 8.
  9. Run nodetool repair to ensure that data on each replica is consistent with data on other nodes.
  10. Install the libaio package for optimal performance.
    RHEL platforms:
    sudo yum install libaio
    Debian:
    sudo apt-get install libaio1
  11. DSE Analytics nodes:
    • If you programmatically set the shuffle parameter, you must change the code for applications that use conf.set("spark.shuffle.service.port", port). Instead, use dse spark-submit which automatically sets the correct service port based on the authentication state. See Configuring Spark for more information.
    • If DSEFS is enabled, copy CFS hivemetastore directory to dse:
      DSE_HOME/bin/dse hadoop fs -cp cfs://127.0.0.1/user/spark/warehouse/ dsefs://127.0.0.1/user/spark/warehouse/
      After upgrade is complete migrate Spark SQL tables (if used) to the new Hive metastore format:
      dse client-tool spark metastore migrate --from 5.0.0 --to 6.0.0
  12. DSE Search nodes:
    • DSE Search in DSE 6.7 uses Apache Solr™ 6.0. Complete all of the steps in Advanced preparation for upgrading DSE Search and SearchAnalytics nodes.
    • Ensure all use of HTTP writes are changed to use CQL commands for updates and inserts.
    • Edit the search index config and make these changes, as needed. See Search index config for valid options to change query behavior for search indexes.
      • Remove the unsupported dataDir option. You can still set the location of search indexes.
      • Remove mergePolicy, maxMergeDocs, and mergeFactor. For example:
        <mergeFactor>25</mergeFactor>
        <maxMergeDocs>...
        <mergePolicy>...
        Use mergePolicyFactory instead, and add mergeScheduler:
        <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
            <int name="maxThreadCount">16</int>
            <int name="maxMergeCount">32</int>
        </mergeScheduler>
        ...
        <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
          <int name="maxMergeAtOnce">10</int>
          <int name="segmentsPerTier">10</int>
        </mergePolicyFactory>
      • Remove any instance of ExtractingRequestHandler.
      • Remove DSENRTCachingDirectoryFactory. Change:
        <directoryFactory name="DirectoryFactory" class="com.datastax.bdp.search.solr.DSENRTCachingDirectoryFactory"/>
        to:
        <directoryFactory name="DirectoryFactory" class="solr.StandardDirectoryFactory"/>
    • Ensure that the catalina.properties and context.xml files are present in the Tomcat conf dir. DSE will not start after upgrade if these files are missing.
      The default location of the Tomcat conf directory depends on the type of installation:
      • Package installations: /etc/dse/tomcat/conf
      • Tarball installations: installation_location/resources/tomcat/conf
    • If earlier DSE versions use a custom configuration for the Solr UI web.xml, change:
      <filter-class>com.datastax.bdp.search.solr.auth.DseAuthenticationFilter</filter-class>
      to
      <filter-class>com.datastax.bdp.cassandra.auth.http.DseAuthenticationFilter</filter-class>
    • StallMetrics MBean is removed. Change operators that use the MBean.
  13. DSE Graph nodes:
    • If your graph nodes have search indexes that you added with gremlin, complete the steps in Advanced preparation for upgrading DSE Graph nodes with search indexes.
    • Ensure that edge label names and property key names use only the supported characters. Edge label names and property key names allow only [a-zA-Z0-9], underscore, hyphen, and period. In earlier versions, edge label names and property key names allowed nearly unrestricted Unicode.
      • schema.describe() displays the entire schema, even if it contains illegal names.
      • In-place upgrades allow existing schemas with invalid edge label names and property key names.
      • Schema elements with illegal names cannot be updated or added.
  14. Back up the configuration files you use to a folder that is not in the directory where you normally run commands.

    The configuration files are overwritten with default values during installation of the new version.

Upgrade steps

To upgrade from DSE 5.0 to DSE 6.7, follow these steps on each node in the recommended order. The upgrade process requires upgrading and restarting one node at a time.
Note: These steps are performed in your upgraded version and use DSE 6.7 documentation.
  1. DSE Analytics nodes: Kill all Spark worker processes.
  2. To flush the commit log of the old installation:
    nodetool -h hostname drain
    This step saves time when nodes start up after the upgrade and prevents DSE Search nodes from having to reindex data.
    Important: This step is mandatory when upgrading between major Cassandra versions that change SSTable formats, rendering commit logs from the previous version incompatible with the new version.
  3. Stop the node.
  4. Use the appropriate method to install the new product version on a supported platform:
    Note: Install the new product version using the same installation type that is on the system, otherwise problems might result.
  5. To configure the new product version:
    1. Compare your backup configuration files to the new configuration files:
      • Review changes in cassandra.yaml and dse.yaml.

        After the upgrade and before restarting with 6.7.0, remove deprecated settings and use the new settings.

        cassandra.yaml changes

        Memtable settings
        Deprecated cassandra.yaml settings
        memtable_heap_space_in_mb
        memtable_offheap_space_in_mb
        Replace with this setting
        memtable_space_in_mb

        Governs heap and offheap space allocation to set a threshold for automatic memtable flush. The calculated default is 1/4 of the heap size.

        Changed setting
        memtable_allocation_type: offheap_objects

        The default method the database uses to allocate and manage memtable memory is  offheap_objects.

        User-defined functions (UDF) settings
        Deprecated cassandra.yaml settings
        user_defined_function_warn_timeout
        user_defined_function_fail_timeout
        Replace with these settings
        user_defined_function_warn_micros: 500
        user_defined_function_fail_micros: 10000
        user_defined_function_warn_heap_mb: 200
        user_defined_function_fail_heap_mb: 500
        user_function_timeout_policy: die

        Settings are in microseconds since Java UDFs run faster. The new timeouts are not equivalent to the deprecated settings.

        Internode encryption settings
        Deprecated cassandra.yaml setting
        server_encryption_options:
            store_type: JKS
        Replace with these settings
        server_encryption_options:
            keystore_type: JKS
            truststore_type: JKS

        Valid type options are JKS, JCEKS, PKCS12, or PKCS11.

        Client-to-node encryption settings
        Deprecated cassandra.yaml setting
        client_encryption_options:
            store_type: JKS
        Replace with these settings
        client_encryption_options:
            keystore_type: JKS
            truststore_type: JKS

        Valid type options are JKS, JCEKS, PKCS12, or PKCS11.

        dse.yaml changes

      • Look for any deprecated, removed, or changed settings.
        Shard transport
        Deprecated dse.yaml settings
        shard_transport_options:
            type: netty
            netty_server_port: 8984
            netty_server_acceptor_threads:
            netty_server_worker_threads:
            netty_client_worker_threads:
            netty_client_max_connections:
            netty_client_request_timeout:
        The http transport type is removed.
        shard_transport_options:
            type: http
            http_shard_client_conn_timeout: 0
            http_shard_client_socket_timeout: 0
        New dse.yaml settings
        shard_transport_options:
            netty_client_request_timeout: 60000
        The shard_transport_options supports only netty_client_request_timeout. Remove any other options under shard_transport_options.
        DSE Analytics nodes: DSEFS settings
        Changed dse.yaml settings
        Although DSEFS is enabled by default, the dsefs.enabled setting is commented out. To enable DSEFS, uncomment all dsefs_options settings.
      • Ensure that keyspace replication factors are correct for your environment:
      • The upgrade installs a new server.xml for Tomcat 8. If your existing server.xml has custom connectors, migrate those connectors to the new server.xml before starting the upgraded nodes.
      • Be sure you are familiar with the Apache Cassandra and DataStax Enterprise changes and features in the new release.
  6. DSE Analytics nodes: If your DSE 5.0 clusters had any datacenters running in Analytics Hadoop mode and if the DseSimpleSnitch was used, you must use one of these options for starting nodes in your cluster. Select the option that works best for your environment:
    • For nodes in the datacenters running in Analytics Hadoop mode, start those nodes in Spark mode.
    • Add the special start-up parameter -Dcassandra.ignore_dc=true for each node, then start in cassandra mode. This flag is required only once after upgrading. Subsequent restarts do not use this flag. You can leave the flag in the configuration file or remove it after the first restart of each node.
  7. Start the node.
  8. Verify that the upgraded datacenter names match the datacenter names in the keyspace schema definition:
    nodetool status
  9. Review the logs for warnings, errors, and exceptions.

    Warnings, errors, and exceptions are frequently found in the logs when starting an upgraded node. Some of these log entries are informational to help you execute specific upgrade-related steps. If you find unexpected warnings, errors, or exceptions, contact DataStax Support.

  10. Repeat the upgrade on each node in the cluster following the recommended order.

    Upgrading and restarting each node is called a rolling restart.

  11. When the upgrade includes a major Cassandra version, you must upgrade the SSTables. DataStax recommends upgrading the SSTables on one node at a time or when using racks, one rack at a time.
    Warning: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage. Upgrading is not complete until the SSTables are upgraded.
    nodetool upgradesstables

    If the SSTables are already on the current version, the command returns immediately and no action is taken. See SSTable compatibility and upgrade version.

    Use the --jobs option to set the number of SSTables that upgrade simultaneously. The default setting is 2, which minimizes impact on the cluster. Set to 0 to use all available compaction threads. DataStax recommends running the upgradesstables command on one node at a time or when using racks, one rack at a time.

    Note: You can run the upgradesstables command before all the nodes are upgraded as long as you run this command on only one node at a time or when using racks, one rack at a time. Running upgradesstables on too many nodes will degrade performance.

Recovery after upgrading to DSE 6.7 without dropping compact storage

Support for Thrift-compatible tables (Compact Storage) is dropped. All tables using Compact Storage must be dropped or migrated to CQL table format before upgrading to DSE 6.7. If a cluster has been upgraded to DSE 6.7 and any Compact Storage tables still exist, follow this procedure to recover and proceed with the upgrade:
  1. Downgrade any nodes which were already upgraded to DSE 6.7 to the latest version in the DSE 5.0 or 5.1 series:
    • DSE 5.0.x, downgrade to 5.0.15 or later
    • DSE 5.1.x, downgrade to 5.1.12 or later
  2. On each node that was attempted to be started on DSE 6.7, start DSE with the following option:
    -Dcassandra.commitlog.ignorereplayerrors=true
  3. On one node (any node) in the cluster, DROP COMPACT STORAGE from tables which use it.
  4. Restart DSE to continue the upgrade to DSE 6.7.

After the upgrade

After all nodes are upgraded and running on DSE 6.7, complete these steps:

  1. If you use the OpsCenter Repair Service, turn on the Repair Service.
  2. Remove any previously installed JTS JAR files from the classpaths in your DSE installation. JTS (Java Topology Suite) is distributed with DSE 6.7.
  3. After all nodes are on DSE 6.7 and the required schema change occurs, the new audit logging feature (CassandraAuditWriter) enables the use of new columns.
  4. Drop the following legacy tables, if they exist: system_auth.users, system_auth.credentials, and system_auth.permissions.

    As described in General upgrade advice, authentication and authorization subsystems now support role-based access control (RBAC).

  5. Review your security configuration. To use security, enable and configure DSE Unified Authentication.

    In cassandra.yaml, the default authenticator is DseAuthenticator and the default authorizer is DseAuthorizer. Other authenticators and authorizers are no longer supported. Security is disabled in dse.yaml by default.

  6. TimeWindowCompactionStrategy (TWCS) is set only on new dse_perf tables. Manually change dse_perf tables that were created in earlier releases to use TWCS. For example:
    ALTER TABLE dse_perf.read_latency_histograms WITH COMPACTION={'class':'TimeWindowCompactionStrategy'};
  7. DSE Search only:
    • The appender SolrValidationErrorAppender and the logger SolrValidationErrorLogger are no longer used and may safely be removed from logback.xml.
    • In contrast to earlier versions, DataStax recommends accepting the new default value of 1024 for back_pressure_threshold_per_core in dse.yaml. See Configuring and tuning indexing performance.
    • If SpatialRecursivePrefixTreeFieldType (RPT) is used in the search schema, replace the units field type with a suitable (degrees, kilometers, or miles) distanceUnits, and then verify that spatial queries behave as expected.
    • The appender SolrValidationErrorAppender and the logger SolrValidationErrorLogger are no longer used and may safely be removed from logback.xml.
    • Applies only if you are using HTTP writes with JSON documents (deprecated), a known issue causes auto generated solrconfig.xml to have invalid requestHandler for JSON core creations after upgrade to 5.1.0. Change the auto generated solrconfig.xml:
      <requestHandler name="/update/json" class="solr.UpdateUpdateRequestHandler" startup="lazy"/>
      to
      <requestHandler name="/update/json" class="solr.UpdateRequestHandler" startup="lazy"/>
    • Slow startup on nodes with large encrypted indexes is resolved. However, action is required to realize the performance gains. You must do a full reindex of all encrypted search indexes on each node in your cluster. Plan sufficient time after the upgrade is complete to reindex with deleteAll=true in a rolling fashion. For example:
      dsetool reload_core keyspace_name.table_name distributed=false reindex=true deleteAll=true 
  8. DSEFS only:
    A new schema is available for DSEFS.
    Warning: Dropping a keyspace is not recoverable without a backup. If you have non-temporary data, do not drop the dsefs keyspace. No action is required. DSEFS will continue to work using the DSE 5.0 schema.
    If you have no data in DSEFS or if you are using DSEFS only for temporary data, follow these steps to use the new schema:
    1. Stop DSEFS on all nodes. In the dsefs_options section of dse.yaml, set enabled: false.
    2. Restart the DSE node.
    3. Drop the dsefs keyspace:
      DROP KEYSPACE dsefs
    4. Clear the dsefs data directories on each node.
      For example, if the dsefs_options section of dse.yaml has data_directories configured as:
      dsefs_options:
           ...
           data_directories:
               - dir: /var/lib/dsefs/data
      this command removes the directories:
      rm -r /var/lib/dsefs/data/*
    5. Start DSEFS with DSE 6.7 to use the new schema.
    6. If you backed up existing DSEFS data before the upgrade, copy the data back into DSEFS from local storage.
  9. DSE Analytics only:
    • Spark Jobserver uses DSE custom version 0.8.0.44. Applications must use the compatible Spark Jobserver API from the DataStax repository.
    • If you are using Spark SQL tables, migrate them to the new Hive metastore format:
      dse client-tool spark metastore migrate --from 5.0.0 --to 6.0.0
  10. Ensure that keyspace replication factors are correct for your environment:

Warning messages during and after upgrade

You can ignore some log messages that occur during and after an upgrade.

  • When upgrading nodes with DSE Advanced Replication, there might be some WriteTimeoutExceptions during a rolling upgrade while mixed versions of nodes exist. Some write consistency limitations apply while mixed versions of nodes exist. The WriteTimeout issue is resolved after all nodes are upgraded.
  • Some gremlin_server properties in earlier versions of DSE are no longer required. If properties exist in the dse.yaml file after upgrading, logs display warnings similar to:
    WARN  [main] 2017-08-31 12:25:30,523 GREMLIN DseWebSocketChannelizer.java:149 - Configuration for the org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0 serializer in dse.yaml overrides the DSE default - typically it is best to allow DSE to configure these.
    You can ignore these warnings or modify dse.yaml so that only the required gremlin server properties are present.
Error messages provide information to help identify problems.
  • If you see an error message like:
    ERROR [main] 2016-07-21 13:52:46,941  CassandraDaemon.java:737 - Cannot start node if snitch's data center (Cassandra) differs from previous data center (Analytics). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
    Follow upgrade instructions in 6. You must start in Spark mode or add the special start-up parameter -Dcassandra.ignore_dc=true for each node.