Upgrading from DataStax Enterprise 5.0 to 6.0

Instructions to upgrade to from DSE 5.0 to 6.0.

dse.yaml

The location of the dse.yaml file depends on the type of installation:
Package installations /etc/dse/dse.yaml
Tarball installations installation_location/resources/dse/conf/dse.yaml

DataStax Enterprise and Apache Cassandra™ configuration files

Configuration file Installer-Services and package installations Installer-No Services and tarball installations
DataStax Enterprise configuration files
byoh-env.sh /etc/dse/byoh-env.sh install_location/bin/byoh-env.sh
dse.yaml /etc/dse/dse.yaml install_location/resources/dse/conf/dse.yaml
logback.xml /etc/dse/cassandra/logback.xml install_location/resources/logback.xml
spark-env.sh /etc/dse/spark/spark-env.sh install_location/resources/spark/conf/spark-env.sh
spark-defaults.conf /etc/dse/spark/spark-defaults.conf install_location/resources/spark/conf/spark-defaults.conf
Cassandra configuration files
cassandra.yaml /etc/cassandra/cassandra.yaml install_location/conf/cassandra.yaml
cassandra.in.sh /usr/share/cassandra/cassandra.in.sh install_location/bin/cassandra.in.sh
cassandra-env.sh /etc/cassandra/cassandra-env.sh install_location/conf/cassandra-env.sh
cassandra-rackdc.properties /etc/cassandra/cassandra-rackdc.properties install_location/conf/cassandra-rackdc.properties
cassandra-topology.properties /etc/cassandra/cassandra-topology.properties install_location/conf/cassandra-topology.properties
jmxremote.password /etc/cassandra/jmxremote.password install_location/conf/jmxremote.password
Tomcat server configuration file
server.xml /etc/dse/resources/tomcat/conf/server.xml install_location/resources/tomcat/conf/server.xml

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations installation_location/resources/cassandra/conf/cassandra.yaml

Upgrade order

Upgrade nodes in this order:
  • In multiple datacenter clusters, upgrade every node in one datacenter before upgrading another datacenter.
  • Upgrade the seed nodes within a datacenter first.
  • Upgrade nodes in this order:
    1. DSE Analytics datacenters
    2. Transactional/DSE Graph datacenters
    3. DSE Search datacenters

server.xml

The default location of the Tomcat server.xml file depends on the installation type:
Package installations /etc/dse/tomcat/conf/server.xml
Tarball installations installation_location/resources/tomcat/conf/server.xml

Follow these instructions to upgrade from DataStax Enterprise 5.0 to DataStax Enterprise 6.0. If you have an earlier version of DSE, upgrade to the latest version of 5.0 before continuing.

Always upgrade to latest patch release on your current version before you upgrade to a higher version. Fixes included in the latest patch release might help or smooth the upgrade process.

The latest version of DSE 5.0 is 5.0.15.

Attention: Read and understand these instructions before upgrading. Carefully reviewing the planning and upgrading instructions can ensure a smooth upgrade and avoid pitfalls and frustrations.
Important: Support for Thrift-compatible tables (COMPACT STORAGE) is dropped in DSE 6.0. Before upgrading to DSE 6.0, you must migrate all tables that have COMPACT STORAGE to CQL table format. The command to migrate Thrift-compatible tables to CQL table format is available in DSE 5.0.12 or later.

Apache Cassandra™ version change

Upgrading from DataStax Enterprise 5.0 to 6.0 includes a major Cassandra version change.
  • DataStax Enterprise 6.0 is compatible with Cassandra 3.11.
  • DataStax Enterprise 5.0 uses Cassandra 3.0.
Be sure to follow the recommendations for upgrading the SSTables.

General recommendations

DataStax recommends backing up your data prior to any version upgrade, including logs and custom configurations. A backup provides the ability to revert and restore all the data used in the previous version if necessary.

OpsCenter provides a Backup service that manages enterprise-wide backup and restore operations for DataStax Enterprise clusters. OpsCenter 6.5 and later is recommended.

Upgrade restrictions and limitations

Restrictions and limitations apply while a cluster is in a partially upgraded state.

With these exceptions, the cluster continues to work as though it were on the earlier version of DataStax Enterprise until all of the nodes in the cluster are upgraded.

General upgrade restrictions
  • Do not enable new features.
  • Do not run nodetool repair. If you have the OpsCenter Repair Service configured, turn off the Repair Service.
  • Ensure OpsCenter compatibility. OpsCenter 6.5 is required for managing DSE 6.0 clusters. See DataStax OpsCenter compatibility with DataStax Enterprise.
  • During the upgrade, do not bootstrap or decommission nodes.
  • Do not issue these types of CQL queries during a rolling restart: DDL and TRUNCATE.
  • During the upgrade, the nodes on different versions might show a schema disagreement.
  • Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage. Upgrading is not complete until the SSTables are upgraded.
  • NodeSync waits to start until all nodes are upgraded.
  • Do not enable Change Data Capture (CDC) on a mixed-version cluster. Upgrade all nodes to DSE 5.1 or later before enabling CDC.
Restrictions for DSE Analytic (Spark) nodes
  • Do not run analytics jobs until all nodes are upgraded.
  • Kill all Spark worker processes before you stop the node and install the new version.
Restrictions for DSE Advanced Replication
  • Support for DSE Advanced Replication V1 in DSE 5.0 is removed. Because DSE Advanced Replication V2 is substantially revised, for V1 installations, you must first upgrade to DSE 5.1.x and migrate to DSE Advanced Replication to V2, and then upgrade to DSE 6.0.
Restrictions for DSEFS
Mixed versions of DSEFS are not supported during the upgrade process.
  • Complete the upgrade on all nodes before running fsck. Starting with DSE 5.1.3, all nodes must be able to report proper block status to the node running fsck. If you run fsck on an upgraded node in a mixed version cluster, nodes with versions earlier than DSE 5.1.3 do not properly report block status and cause the fsck to incorrectly assume that data is corrupt or unavailable. The fsck will incorrectly try to repair them.
  • A protocol change in DSE 5.1.3 improves efficiency of passing JSON arrays between DSEFS server and client. Upgrade all nodes in the cluster before using the DSEFS shell.
Restrictions for DSE Graph
Graph nodes have the same restrictions as the workload they run on. General graph restrictions apply for all nodes, such as not altering graph schema during upgrades. Workload-specific restrictions apply for analytics and search nodes, such as no OLAP queries during upgrades.
Restrictions for DSE Search
  • Do not update schemas.
  • Do not reindex DSE Search nodes during upgrade.
  • DSE 6.0 introduces a new Lucene codec. Segments written with this new codec cannot be read by earlier versions of DSE. To downgrade to earlier versions, the entire data directory for the search index in question must be cleared.
  • DSE Search in DataStax Enterprise 6.0 uses Apache Solr 6.0. This significant change requires advanced planning and specific actions before and after the upgrade.
    Important: Before you upgrade DSE Search or SearchAnalytics workloads, you must follow the specific steps in Advanced preparation for upgrading DSE Search and SearchAnalytics nodes.
Restrictions for nodes using any kind of security
  • Do not change security credentials or permissions until the upgrade is complete on all nodes.
  • If you are not already using Kerberos, do not set up Kerberos authentication before upgrading. First upgrade the cluster, and then set up Kerberos.
Security changes
After upgrading, the default authenticator is DseAuthenticator and default authorizer is DseAuthorizer in cassandra.yaml. Other authorizers and authenticators are no longer supported, follow steps in After the upgrade.
Upgrading drivers and possible impact when driver versions are incompatible
Be sure to check driver compatibility. Depending on the driver version, you might need to recompile your client application code. See Upgrading DataStax drivers.
During upgrades, you might experience driver-specific impact when clusters have mixed versions of drivers. If your cluster has mixed versions, the protocol version is negotiated with the first host that the driver connects to. To avoid driver version incompatibility during upgrades, use one of these workarounds:
  • Protocol version: Because some drivers can use different protocol versions, force the protocol version at start up. For example, keep the Java driver at its current protocol version while the driver upgrade is happening. Switch to the Java driver to the new protocol version only after the upgrade is complete on all nodes in the cluster.
  • Initial contact points: Ensure that the list of initial contact points contains only hosts with the oldest driver version. For example, the initial contact points contain only Java driver v2.
For details on protocol version negotiation, see protocol versions with mixed clusters in the Java driver version you're using, for example, Java driver.

Advanced preparation for upgrading DSE Search and SearchAnalytics nodes

Before starting the Preparing to upgrade steps, complete all the advanced preparation steps on DSE Search and SearchAnalytics nodes while DSE 5.0 is still running.

Plan sufficient time to implement and test the required changes before the upgrade:
  • Schema changes require a full reindex.
  • Configuration changes require reloading the core.
  1. Change HTTP queries to CQL:
    • Delete-by-id is removed, use CQL DELETE by primary key instead.
    • Delete-by-query no longer supports wildcards, use CQL TRUNCATE instead.
  2. If any Solr core was created on DSE 4.6 or earlier and never reindexed after being upgraded to DSE 4.7 or later, you must reindex on DSE 5.0 before upgrading to DSE 5.1 or later.
  3. Ensure that the shard_transport_options in dse.yaml are set only for netty_client_request_timeout:
    shard_transport_options:
        netty_client_request_timeout: 60000
    In DSE 5.1 and later, the shard transport option supports only netty_client_request_timeout. Remove any other shard_transport_options.
  4. If you are using Apache Solr SolrJ, the minimum required version is 6.0.0.
  5. For SpatialRecursivePrefixTreeFieldType (RPT) in search schemas, you must adjust your queries for these changes:
    • IsDisjointTo is no longer supported in queries on SpatialRecursivePrefixTreeFieldType. Replace IsDisjointTo with a NOT Intersects query. For example:
      foo:0,0 TO 1000,1000 AND -"Intersects(POLYGON((338 211, 338 305, 404 305, 404 211, 338 211)))")
    • The ENVELOPE syntax is now required for WKT-style queries against SpatialRecursivePrefixTreeFieldType fields. You must specify ENVELOPE(10, 15, 15, 10), where queries on earlier releases could specify 10 10 15 15.
    See Spatial Search for details on using distanceUnits in spatial queries.
  6. For upgrades to DSE 6.0.0 and DSE 6.0.1 only Stored=true copy fields are not supported and cause schema validation to fail. The stored=true copyField directive has not been supported since DSE 4.7, so you probably do not have Stored=true copy fields. If you do:
    • Change the stored attribute value of all copyField directives from true to false in the schema.xml file and then use dsetool reload_core to reload the modified schema.
    • You must ensure that application designs and implementations recognize this change.
    Note: DSE 6.0.2 and later ignores stored=true.
  7. Edit the solrconfig.xml file and make these changes, as needed:
    • Remove these requestHandlers: XmlUpdateRequestHandler, BinaryUpdateRequestHandler, CSVRequestHandler, JsonUpdateRequestHandler, DataImportHandler. Solr deprecated and then removed these requestHandlers.

      For example:

      <requestHandler name="/dataimport" class="solr.DataImportHandler"/>

      or

      <requestHandler name="/update" class="solr.XmlUpdateRequestHandler"/>
    • Change the directoryFactory from:
      <directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
      to
      <directoryFactory name="DirectoryFactory" class="solr.StandardDirectoryFactory"/>
    • <unlockOnStartup> is now unsupported as a result of LUCENE-6508 and SOLR-7942.
    • Change the updateLog from:
      <updateLog class="solr.FSUpdateLog" force="false">
      to
      <updateLog force="false">
  8. Upgrading DSE search nodes to DSE 5.1 and later requires replacing unsupported Solr types with supported types.
    Note: Special handling is also required for BCDStrField, addressed in 10.
    Sorting limitations apply to mixed version clusters. Some of the removed Solr types, due to the way they marshal sort values during distributed queries (combined with the way the suggested new types unmarshal sort values), cannot be sorted on during rolling upgrades when some nodes use an unsupported type and other nodes use the suggested new type. The following type transitions are problematic:
    Removed Solr field types Supported Solr field types
    ByteField TrieIntField
    DateField TrieDateField
    BCDIntField TrieIntField
    BCDLongField TrieLongField
    Two options are available:
    1. Avoid sorting on removed Solr field types until the upgrade to DSE 5.1 or later is complete for all nodes in the datacenter being queried.
      Tip: When using two search datacenters, isolate queries to a single datacenter and then change the schema and reindex the other datacenter. Then isolate queries to the newly reindexed datacenter while you change the schema and upgrade the first datacenter.
    2. If you are using BCDIntField or BCDLongField, update the schema to replace BCDIntField and BCDLongField with types that are sort-compatible with the supported Solr types TrieIntField and TrieLongField:
      Removed Solr field types Interim sort-compatible supported Solr field types
      BCDIntField SortableIntField
      BCDLongField SortableLongField
      Change the schema in a distributed fashion, and do not reindex. After the schema is updated on all nodes, then go on to 9.
  9. Update the schema and configuration for the Solr field types that are removed from Solr 5.5 and later.
    • Update the schema to replace unsupported Solr field types with supported Solr field types:
      Removed Solr field types Supported Solr field types
      ByteField TrieIntField
      DateField TrieDateField
      DoubleField TrieDoubleField
      FloatField TrieFloatField
      IntField TrieIntField
      LongField TrieLongField
      ShortField TrieIntField
      SortableDoubleField TrieDoubleField
      SortableFloatField TrieFloatField
      SortableIntField TrieIntField
      SortableLongField TrieLongField
      BCDIntField TrieIntField
      BCDLongField TrieLongField
      BCDStrField (see 10 if used) TrieIntField
    • If you are using type mapping version 0, or you do not specify a type mapper, verify or update the solrconfig.xml to use dseTypeMappingVersion 1:
      <dseTypeMappingVersion>1</dseTypeMappingVersion>
      If the Solr core is backed by a CQL table and the type mapping is unspecified, use type mapping version 2.
    • Reload the core:
      dsetool reload_core keyspace_name.table_name schema=filepath solrconfig=filepath
      If you were using the unsupported data types, do a full reindex node-by-node:
      dsetool reload_core keyspace_name.table_name schema=filepath solrconfig=filepath reindex=true deleteAll=true distributed=false
    Note: In DSE 5.1 and later, auto generated schemas use data type mapper 2.
  10. If using BCDStrField: In DSE 5.0 and earlier, DSE mapped Cassandra text columns to BCDStrField. The deprecated BCDStrField was removed in DSE 5.1.0.
    The recommended strategy is to upgrade the data type to TrieIntField. However, DSE cannot map text directly to TrieIntField. If you are using BCDStrField, you must complete one of these options before the upgrade.
    1. If BCDStrField is no longer used, remove the BCDStrField field from the Solr schema. Reindexing is not required.
    2. If you want to index the field as a TrieIntField, and a full reindex is acceptable, change the underlying database column to use the type int.
    3. If you want to keep the database column as text and you still want to do simple matching queries on the indexed field, switch from BCDStrField to StrField in the schema. Indexing should not be required, but the field will no longer be appropriate for numeric range queries or sorting, because StrField uses a lexicographic order, not a numeric one.
    4. Not recommended: If you want to keep the database column as text and still want to perform numeric range queries and sorts on the former BCDStrField, but would rather change their application than perform a full reindex:
      • Change the field to StrField in the Solr schema with indexed=false.
      • Add a new copy field with the type TrieIntField that has its values supplied by the original BCDStrField.
      This solution still requires reindex to work, because the copy field target must be populated. This non-recommended option is supplied only to support a sub-optimal data model; for example, a text column with values that would fit only into an int.
    After you make these schema changes, do a rolling, node-by-node reload_core with reindex=true, distributed=false, and deleteAll=true.
    Note: If you have two datacenters and upgrade them one at a time, reload the core with distributed=true and deleteAll=true.
  11. Tune the schema before you upgrade. For DSE 5.1.4 and later, all field definitions in the schema are validated and must be DSE Search compatible, even if the fields are not indexed, have docValues applied, or used for copy-field source. The default behavior of automatic resource generation includes all columns. To improve performance, take action to prevent the fields from being loaded from the database. Include only the required fields in the schema by removing or commenting out unused fields in the schema.

Advanced preparation for upgrading DSE Graph nodes with search indexes

These steps apply to graph nodes that have search indexes. Before starting the Preparing to upgrade steps, complete these advanced preparation steps while DSE 5.0 is still running.

Upgrading DSE Graph nodes with search indexes requires these edits to the solrconfig file. Configuration changes require reloading the core. Plan sufficient time to implement and test changes that are required before the upgrade.

Edit the solrconfig.xml file and make these changes, as needed:
  • Remove these requestHandlers: XmlUpdateRequestHandler, BinaryUpdateRequestHandler, CSVRequestHandler, JsonUpdateRequestHandler, and DataImportHandler. Solr deprecated and then removed these requestHandlers.

    For example:

    <requestHandler name="/dataimport" class="solr.DataImportHandler"/>

    or

    <requestHandler name="/update" class="solr.XmlUpdateRequestHandler">/>
  • <unlockOnStartup> is now unsupported as a result of LUCENE-6508 and SOLR-7942.
  • Reload the core so that this configuration change is respected:
    dsetool reload_core keyspace_name.table_name reindex=false

Advanced preparation for upgrading DSE Analytics nodes

Upgrades from DSE 5.0 to DSE 6.0 include a major upgrade to Spark 2.2, as well as a tighter integration between DSE and Spark. For information on Spark 2.2, see the Spark documentation. Spark 2.2 uses Scala 2.11.

A new Spark resource manager uses the DataStax Java Driver and native CQL protocol for managing communication between DSE Spark nodes. DSE 5.0 used Spark RPC. This resource manage change impacts Spark applications that ran on DSE 5.0.
  1. Spark 2.2 uses Scala 2.11. You must recompile all DSE 5.0 Scala Spark applications against Scala 2.11 and use only Scala 2.11 third-party libraries.

    Changing the dse-spark-dependencies in your build files is not sufficient to change the compilation target. See the example projects for how to set up your build files.

  2. Spark applications should use dse:// URLs instead of spark://spark_master_IP:Spark_RPC_port_number URLs, as described in Specifying Spark URLs.

    You no longer need to specify the Spark master IP address or hostname when using dse:// URLs. Connecting to any Spark node will redirect the request to the master node.

  3. If you have existing Spark application code that uses spark://Spark master IP:Spark RPC port to connect, it will no longer work.

    For example, the following code worked in DSE 5.0 but will not work in DSE 5.1 or later.

    val conf = new SparkConf(true) 
    .setMaster("spark://192.168.123.10:7077") 
    .setAppName("cassandra-demo") 
    .set("cassandra.connection.host" , "192.168.123.10") // initial contact 
    .set("cassandra.username", "cassandra") 
    .set("cassandra.password", "cassandra") 
    val sc = new SparkContext(conf)

    To connect to DSE 5.1 and later, you no longer need to call setMaster. This code will work in DSE 5.1 and later:

    val conf = new SparkConf(true)
    .setAppName("cassandra-demo") 
    .set("cassandra.connection.host" , "192.168.123.10") // initial contact 
    .set("cassandra.username", "cassandra") 
    .set("cassandra.password", "cassandra") 
    val sc = new SparkContext(conf)

    If you need to specify the master using setMaster, use the dse:// URL format.

  4. Starting in DSE 5.1, you can restrict Spark jobs to specific database roles. See Managing Spark application permissions.
  5. Starting in DSE 5.1, you can set the Spark executor process owners, as described in Running Spark processes as separate users.
  6. The user submitting the Spark application no longer has to be the same database role. See Specifying Spark URLs for information on how to change the master connection submission to use a different user or cluster than the database connection.
Backing up DSEFS data

These steps only apply to nodes that use DSEFS. Before starting the Preparing to upgrade steps, complete these advanced preparation steps.

The DSEFS schema used by the database was improved in DSE 5.1, but the old schema is still supported and will not be modified during the upgrade. To use the new DSEFS schema with existing DSEFS data, backup the DSEFS data before upgrading:

Backup the current DSEFS data to local storage using dse hadoop fs -cp command:

dse hadoop fs -cp /* /local_backup_location

Preparing to upgrade

Follow these steps to prepare each node for upgrading from DataStax Enterprise 5.0 to DataStax Enterprise 6.0:
  1. Upgrade to the latest patch release on your current version. Fixes included in the latest patch release can simplify the upgrade process.
  2. Before upgrading, be sure that each node has ample free disk space.

    The required space depends on the compaction strategy. See Disk space in Planning and testing DataStax Enterprise deployments.

  3. Familiarize yourself with the changes and features in this release:
  4. Replace ITriggers and custom interfaces.

    Several internal and beta extension points were modified to necessitate core storage engine refactoring. All custom implementations, including the following interfaces, must be replaced with supported implementations when upgrading to DSE 6.0. Because a rewrite of the following interfaces is required for DSE 6.0, DataStax can help you find a solution.

    • The org.apache.cassandra.triggers.ITrigger interface was modified from augment to augmentNonBlocking for non-blocking internal architecture. Updated trigger implementations must be provided on upgraded nodes. If unsure, drop all existing triggers before upgrading. To check for existing triggers:
      SELECT * FROM system_schema.triggers
    • The org.apache.cassandra.index.Index interface was modified to comply with the core storage engine changes. Updated implementations are required. If unsure, drop all existing custom secondary indexes before upgrading, except DSE Search indexes, which do not need to be replaced. To check for existing indexes:
      SELECT * FROM system_schema.indexes
    • The org.apache.cassandra.cql3.QueryHandler, org.apache.cassandra.db.commitlog.CommitLogReadHandler, and other extension points have been changed. See QueryHandlers.
  5. Support for Thrift-compatible tables (COMPACT STORAGE) is dropped. Before upgrading, follow the steps to migrate all tables that have COMPACT STORAGE to CQL table format while DSE 5.x.x is running.
    Note: Do not migrate system.* tables, COMPACT STORAGE is removed by DSE internally.
    For DSE Analytics, drop compact storage from all the tables in the "HiveMetaStore" and PortfolioDemo keyspaces.
    After COMPACT STORAGE is dropped, columns to support migration to CQL-compatible table format are added as described in migrating from compact storage.
    Attention: DSE 6.0 will not start if COMPACT STORAGE tables are present. Creating a COMPACT STORAGE table in a mixed-version cluster is not supported. Driver connections to the latest DSE 5.0.x and DSE 5.1.x run in a "NO_COMPACT" mode that causes compact tables to appear as if the compact flags were already dropped, but only for the current session.
  6. If audit logging is configured to use CassandraAuditWriter, run these commands as super user on DSE 5.0 nodes, and then ensure that the entire cluster has schema agreement:
    ALTER TABLE dse_audit.audit_log ADD authenticated text;
    ALTER TABLE dse_audit.audit_log ADD consistency text;
  7. Upgrade the SSTables on each node to ensure that all SSTables are on the current version.
    nodetool upgradesstables
    This step is required for DataStax Enterprise upgrades that include a major Cassandra version changes.
    Warning: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage.

    If the SSTables are already on the current version, the command returns immediately and no action is taken.

  8. Verify the Java runtime version and upgrade to the recommended version.
    java -version
    Important: Although Oracle JRE/JDK 8 is supported, DataStax does more extensive testing on OpenJDK 8.
  9. Run nodetool repair to ensure that data on each replica is consistent with data on other nodes.
  10. Install the libaio package for optimal performance.
    RHEL platforms:
    sudo yum install libaio
    Debian:
    sudo apt-get install libaio1
  11. DSE Analytics nodes:
    • Support for Thrift-compatible tables (COMPACT STORAGE) is dropped. Before upgrading, follow the steps to migrate all tables that have COMPACT STORAGE to CQL table format while DSE 5.x.x is running.
      Note: Do not migrate system.* tables, COMPACT STORAGE is removed by DSE internally.
      For DSE Analytics, drop compact storage from all the tables in the "HiveMetaStore" and PortfolioDemo keyspaces.
      After COMPACT STORAGE is dropped, columns to support migration to CQL-compatible table format are added as described in migrating from compact storage.
      Attention: DSE 6.0 will not start if COMPACT STORAGE tables are present. Creating a COMPACT STORAGE table in a mixed-version cluster is not supported. Driver connections to the latest DSE 5.0.x and DSE 5.1.x run in a "NO_COMPACT" mode that causes compact tables to appear as if the compact flags were already dropped, but only for the current session.
    • If you programmatically set the shuffle parameter, you must change the code for applications that use conf.set("spark.shuffle.service.port", port). Instead, use dse spark-submit which automatically sets the correct service port based on the authentication state. See Configuring Spark for more information.
    • If DSEFS is enabled, copy CFS hivemetastore directory to dse:
      DSE_HOME/bin/dse hadoop fs -cp cfs://127.0.0.1/user/spark/warehouse/ dsefs://127.0.0.1/user/spark/warehouse/

      Another step is required after upgrade is complete.

    • Cassandra File System (CFS) is removed. Remove the cfs and cfs_archive keyspaces before upgrading. See the From CFS to DSEFS blog post on the DataStax Developer website and the Copying data from CFS to DSEFS documentation for more information.
  12. DSE Search nodes:
    • DSE Search in DataStax Enterprise 6.0 uses Apache Solr™ 6.0. Complete all of the steps in Advanced preparation for upgrading DSE Search and SearchAnalytics nodes.
    • Ensure all use of HTTP writes are changed to use CQL commands for updates and inserts.
    • Edit the search index config and make these changes, as needed. See Search index config for valid options to change query behavior for search indexes.
      • Remove the unsupported dataDir option. You can still set the location of search indexes.
      • Remove mergePolicy, maxMergeDocs, and mergeFactor. For example:
        <mergeFactor>25</mergeFactor>
        <maxMergeDocs>...
        <mergePolicy>...
        Use mergePolicyFactory instead, and add mergeScheduler:
        <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler">
            <int name="maxThreadCount">16</int>
            <int name="maxMergeCount">32</int>
        </mergeScheduler>
        ...
        <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
          <int name="maxMergeAtOnce">10</int>
          <int name="segmentsPerTier">10</int>
        </mergePolicyFactory>
      • Remove any instance of ExtractingRequestHandler.
      • Remove DSENRTCachingDirectoryFactory. Change:
        <directoryFactory name="DirectoryFactory" class="com.datastax.bdp.search.solr.DSENRTCachingDirectoryFactory"/>
        to:
        <directoryFactory name="DirectoryFactory" class="solr.StandardDirectoryFactory"/>
    • Ensure that the catalina.properties and context.xml files are present in the Tomcat conf dir. DSE will not start after upgrade if these files are missing.
      The default location of the Tomcat conf directory depends on the type of installation:
      • Package installations: /etc/dse/tomcat/conf
      • Tarball installations: installation_location/resources/tomcat/conf
    • If earlier DSE versions use a custom configuration for the Solr UI web.xml, change:
      <filter-class>com.datastax.bdp.search.solr.auth.DseAuthenticationFilter</filter-class>
      to
      <filter-class>com.datastax.bdp.cassandra.auth.http.DseAuthenticationFilter</filter-class>
    • StallMetrics MBean is removed. Change operators that use the MBean.
  13. DSE Graph nodes:
    • If your graph nodes have search indexes that you added with gremlin, complete the steps in Advanced preparation for upgrading DSE Graph nodes with search indexes.
    • Ensure that edge label names and property key names use only the supported characters. Edge label names and property key names allow only [a-zA-Z0-9], underscore, hyphen, and period. In earlier versions, edge label names and property key names allowed nearly unrestricted Unicode.
      • schema.describe() displays the entire schema, even if it contains illegal names.
      • In-place upgrades allow existing schemas with invalid edge label names and property key names.
      • Schema elements with illegal names cannot be updated or added.
  14. Back up the configuration files you use to a folder that is not in the directory where you normally run commands.

    The configuration files are overwritten with default values during installation of the new version.

  15. Upgrades from 5.0.0 to 5.0.8 and from DSE 5.1.0 and 5.1.1 to DSE 5.1.2 and later releases

    The messaging protocol version in DSE 5.1.2 has been changed to VERSION_3014. Schema migrations rely on exact messaging protocol versions. To accommodate schema changes that might occur during the upgrade, force a backward compatible messaging protocol.

    Before you upgrade, restart the node with this start-up parameter:
    -Dcassandra.force_3_0_protocol_version=true
    For example:
    installation_location/bin/dse cassandra -Dcassandra.force_3_0_protocol_version=true
    Note: While mixed versions exist during the upgrade, do not add or remove columns from existing tables.

    After the upgrade is complete on all nodes, restart nodes without this flag.

Upgrade steps

Follow these steps on each node to upgrade from DataStax Enterprise 5.0 to DataStax Enterprise 6.0. Some warning messages are displayed during and after upgrade.

  1. DSE Analytics nodes: Kill all Spark worker processes.
  2. To flush the commit log of the old installation:
    nodetool -h hostname drain
    This step saves time when nodes start up after the upgrade and prevents DSE Search nodes from having to reindex data.
    Important: This step is mandatory when upgrading between major Cassandra versions that change SSTable formats, rendering commit logs from the previous version incompatible with the new version.
  3. Stop the node. See Stopping a DataStax Enterprise node.
    • To stop DataStax Enterprise running as a service:
      sudo service dse stop
    • To stop DataStax Enterprise running as a stand-alone process:
      bin/dse cassandra-stop
  4. Use the appropriate method to install the new product version on a supported platform:
    Note: Install the new product version using the same installation type that is on the system, otherwise problems might result.
  5. To configure the new product version:
    1. Compare your backup configuration files to the new configuration files:
      • Look for any deprecated, removed, or changed settings.
        • DSE Search nodes
          • While the node is down, edit dse.yaml and remove these options:
            • cql_solr_query_executor_threads
            • enable_back_pressure_adaptive_nrt_commit
            • max_solr_concurrency_per_core
            • solr_indexing_error_log_options
            DSE 6.0 will not start with these options present.
        • DSE Analytics nodes
          Note: Although DSEFS is enabled by default in DSE 5.1.0 and later, the dsefs.enabled setting is commented out in dse.yaml. To enable DSEFS, uncomment the dsefs_options.enabled setting. (DSP-13310)
      • The upgrade installs a new server.xml for Tomcat 8. If your existing server.xml has custom connectors, migrate those connectors to the new server.xml before starting the upgraded nodes.
      • Be sure you are familiar with the Apache Cassandra and DataStax Enterprise changes and features in the new release.
      • Ensure that keyspace replication factors are correct for your environment:
  6. DSE Analytics nodes: If your DSE 5.0 clusters had any datacenters running in Analytics Hadoop mode and if the DseSimpleSnitch was used, you must do one of these:
    • For nodes in the datacenters running in Analytics Hadoop mode, start those nodes in Spark mode.
    • Add the special start-up parameter -Dcassandra.ignore_dc=true for each node, then start in cassandra mode. This flag is required only once after upgrading. Subsequent restarts do not use this flag. You can leave the flag in the configuration file or remove it after the first restart of each node.
  7. Start the node.
  8. Verify that the upgraded datacenter names match the datacenter names in the keyspace schema definition:
    nodetool status
  9. Review the logs for warnings, errors, and exceptions.

    Warnings, errors, and exceptions are frequently found in the logs when starting an upgraded node. Some of these log entries are informational to help you execute specific upgrade-related steps. If you find unexpected warnings, errors, or exceptions, contact DataStax Support.

  10. Repeat the upgrade on each node in the cluster following the recommended order.
  11. When the upgrade includes a major Cassandra version, you must upgrade the SSTables. DataStax recommends upgrading the SSTables on one node at a time or when using racks, one rack at a time.
    Warning: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage. Upgrading is not complete until the SSTables are upgraded.
    nodetool upgradesstables

    If the SSTables are already on the current version, the command returns immediately and no action is taken. See SSTable compatibility and upgrade version.

    Use the --jobs option to set the number of SSTables that upgrade simultaneously. The default setting is 2, which minimizes impact on the cluster. Set to 0 to use all available compaction threads. DataStax recommends running the upgradesstables command on one node at a time or when using racks, one rack at a time.

    Note: You can run the upgradesstables command before all the nodes are upgraded as long as you run this command on only one node at a time or when using racks, one rack at a time. Running upgradesstables on too many nodes will degrade performance.

Recovery after upgrading to DSE 6.0 without dropping compact storage

DSE 6.0 removed support for Thrift-compatible tables (Compact Storage). All tables using Compact Storage must be dropped or migrated to CQL table format before upgrading to DSE 6.0. If a cluster has been upgraded to DSE 6.0 and any Compact Storage tables still exist, follow this procedure to recover and proceed with the upgrade:
  1. Downgrade any nodes which were already upgraded to DSE 6.0 to the latest version in the DSE 5.0 or 5.1 series:
    • DSE 5.0.x, downgrade to 5.0.15 or later
    • DSE 5.1.x, downgrade to 5.1.11 or later
  2. On each node that was attempted to be started on DSE 6.0, start DSE with the -Dcassandra.commitlog.ignorereplayerrors=true option.
  3. On one node (any node) in the cluster, DROP COMPACT STORAGE from tables which use it.
  4. Restart DSE to continue the upgrade to DSE 6.0.

After the upgrade

After all nodes are upgraded and running on DSE 6.0, complete these steps:

  1. If you use the OpsCenter Repair Service, turn on the Repair Service.
  2. After all nodes are on DSE 6.0 and the required schema change occurs, the new audit logging feature (CassandraAuditWriter) enables the use of new columns.
  3. Drop the following legacy tables, if they exist: system_auth.users, system_auth.credentials, and system_auth.permissions.

    As described in General upgrade advice, authentication and authorization subsystems now support role-based access control (RBAC).

  4. Review your security configuration. To use security, enable and configure DSE Unified Authentication.

    In cassandra.yaml, the default authenticator is DseAuthenticator and the default authorizer is DseAuthorizer. Other authenticators and authorizers are no longer supported. Security is disabled in dse.yaml by default.

  5. TimeWindowCompactionStrategy (TWCS) is set only on new dse_perf tables. Manually change dse_perf tables that were created in earlier releases to use TWCS. For example:
    ALTER TABLE dse_perf.read_latency_histograms WITH COMPACTION={'class':'TimeWindowCompactionStrategy'};
  6. DSE Search only:
    • The appender SolrValidationErrorAppender and the logger SolrValidationErrorLogger are no longer used and may safely be removed from logback.xml.
    • In contrast to earlier versions, DataStax recommends accepting the new default value of 1024 for back_pressure_threshold_per_core in dse.yaml. See Configuring and tuning indexing performance.
    • If SpatialRecursivePrefixTreeFieldType (RPT) is used in the search schema, replace the units field type with a suitable (degrees, kilometers, or miles) distanceUnits, and then verify that spatial queries behave as expected.
    • The appender SolrValidationErrorAppender and the logger SolrValidationErrorLogger are no longer used and may safely be removed from logback.xml.
    • Applies only if you are using HTTP writes with JSON documents (deprecated), a known issue causes auto generated solrconfig.xml to have invalid requestHandler for JSON core creations after upgrade to 5.1.0. Change the auto generated solrconfig.xml:
      <requestHandler name="/update/json" class="solr.UpdateUpdateRequestHandler" startup="lazy"/>
      to
      <requestHandler name="/update/json" class="solr.UpdateRequestHandler" startup="lazy"/>
    • Slow startup on nodes with large encrypted indexes is resolved. However, action is required to realize the performance gains. You must do a full reindex of all encrypted search indexes on each node in your cluster. Plan sufficient time after the upgrade is complete to reindex with deleteAll=true in a rolling fashion. For example:
      dsetool reload_core keyspace_name.table_name distributed=false reindex=true deleteAll=true 
  7. DSEFS only: A new schema is available for DSEFS.
    Warning: Dropping a keyspace is not recoverable without a backup. If you have non-temporary data, do not drop the dsefs keyspace. No action is required. DSEFS will continue to work using the DSE 5.0 schema.
    If you have no data in DSEFS or if you are using DSEFS only for temporary data, follow these steps to use the new schema:
    1. Stop DSEFS on all nodes. In the dsefs_options section of dse.yaml, set enabled: false.
    2. Restart the DSE node.
    3. Drop the dsefs keyspace:
      DROP KEYSPACE dsefs
    4. Clear the dsefs data directories on each node.
      For example, if the dsefs_options section of dse.yaml has data_directories configured as:
      dsefs_options:
           ...
           data_directories:
               - dir: /var/lib/dsefs/data
      This command removes the directories:
      rm -r /var/lib/dsefs/data/*
    5. Start DSEFS with DSE 6.0 to use the new schema.
    6. If you backed up existing DSEFS data before the upgrade, copy the data back into DSEFS from local storage.
  8. DSE Analytics only:
    • Spark Jobserver uses DSE custom version 0.8.0.44. Applications must use the compatible Spark Jobserver API from the DataStax repository.
    • If you are using Spark SQL tables, migrate them to the new Hive metastore format:
      dse client-tool spark metastore migrate --from 5.0.0 --to 6.0.0
  9. Ensure that keyspace replication factors are correct for your environment:

Warning messages during and after upgrade

You can ignore some log messages that occur during and after an upgrade.

  • When upgrading nodes with DSE Advanced Replication, there might be some WriteTimeoutExceptions during a rolling upgrade while mixed versions of nodes exist. Some write consistency limitations apply while mixed versions of nodes exist. The WriteTimeout issue is resolved after all nodes are upgraded.
  • Some gremlin_server properties in earlier versions of DSE are no longer required. If properties exist in the dse.yaml file after upgrading, logs display warnings similar to:
    WARN  [main] 2017-08-31 12:25:30,523 GREMLIN DseWebSocketChannelizer.java:149 - Configuration for the org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0 serializer in dse.yaml overrides the DSE default - typically it is best to allow DSE to configure these.
    You can ignore these warnings or modify dse.yaml so that only the required gremlin server properties are present.
Error messages provide information to help identify problems.
  • If you see an error message like:
    ERROR [main] 2016-07-21 13:52:46,941  CassandraDaemon.java:737 - Cannot start node if snitch's data center (Cassandra) differs from previous data center (Analytics). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
    Follow upgrade instructions in 6. You must start in Spark mode or add the special start-up parameter -Dcassandra.ignore_dc=true for each node.