Upgrading from DataStax Enterprise 5.0 to 6.0
Instructions for upgrading from DSE 5.0 to 6.0.
Upgrading major Cassandra version
Upgrading SSTables is required for upgrades that contain major Apache Cassandra releases:- DataStax Enterprise 6.7 is compatible with Cassandra 3.11.
- DataStax Enterprise 6.0 is compatible with Cassandra 3.11.
- DataStax Enterprise 5.1 uses Cassandra 3.11.
- DataStax Enterprise 5.0 uses Cassandra 3.0.
- DataStax Enterprise 4.7 to 4.8 use Cassandra 2.1.
- DataStax Enterprise 4.0 to 4.6 use Cassandra 2.0.
server.xml
The default location of the Tomcat server.xml file depends on the installation type:Package installations | /etc/dse/tomcat/conf/server.xml |
Tarball installations | installation_location/resources/tomcat/conf/server.xml |
Upgrade order
Upgrade nodes in this order:- In multiple datacenter clusters, upgrade every node in one datacenter before upgrading another datacenter.
- Upgrade the seed nodes within a datacenter first.
- Upgrade nodes in this order:
- DSE Analytics datacenters
- Transactional/DSE Graph datacenters
- DSE Search datacenters
logback.xml
The location of the logback.xml file depends on the type of installation:Package installations | /etc/dse/cassandra/logback.xml |
Tarball installations | installation_location/resources/cassandra/conf/logback.xml |
DataStax driver changes
DataStax drivers come in two types:
- DataStax drivers for DataStax Enterprise — for use by DSE 4.8 and later
- DataStax drivers for Apache Cassandra™ — for use by Apache Cassandra™ and DSE 4.7 and earlier
DataStax drivers for DataStax Enterprise | DataStax drivers for Apache Cassandra |
---|---|
C/C++ | C/C++ |
C# | C# |
Java | Java |
Node.js | Node.js |
Python | Python |
Maintenance mode drivers | |
Supported by DataStax, but only critical bug fixes will be included in new versions. | |
PHP | PHP |
Ruby | Ruby |
Additional driver documentation | |
All Drivers | Version compatibiliy |
DataStax Enterprise and Apache Cassandra™ configuration files
Configuration file | Installer-Services and package installations | Installer-No Services and tarball installations |
---|---|---|
DataStax Enterprise configuration files | ||
byoh-env.sh | /etc/dse/byoh-env.sh | install_location/bin/byoh-env.sh |
dse.yaml | /etc/dse/dse.yaml | install_location/resources/dse/conf/dse.yaml |
logback.xml | /etc/dse/cassandra/logback.xml | install_location/resources/logback.xml |
spark-env.sh | /etc/dse/spark/spark-env.sh | install_location/resources/spark/conf/spark-env.sh |
spark-defaults.conf | /etc/dse/spark/spark-defaults.conf | install_location/resources/spark/conf/spark-defaults.conf |
Cassandra configuration files | ||
cassandra.yaml | /etc/cassandra/cassandra.yaml | install_location/conf/cassandra.yaml |
cassandra.in.sh | /usr/share/cassandra/cassandra.in.sh | install_location/bin/cassandra.in.sh |
cassandra-env.sh | /etc/cassandra/cassandra-env.sh | install_location/conf/cassandra-env.sh |
cassandra-rackdc.properties | /etc/cassandra/cassandra-rackdc.properties | install_location/conf/cassandra-rackdc.properties |
cassandra-topology.properties | /etc/cassandra/cassandra-topology.properties | install_location/conf/cassandra-topology.properties |
jmxremote.password | /etc/cassandra/jmxremote.password | install_location/conf/jmxremote.password |
Tomcat server configuration file | ||
server.xml | /etc/dse/resources/tomcat/conf/server.xml | install_location/resources/tomcat/conf/server.xml |
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
cassandra.yaml
The location of the cassandra.yaml file depends on the type of installation:Package installations | /etc/dse/cassandra/cassandra.yaml |
Tarball installations | installation_location/resources/cassandra/conf/cassandra.yaml |
The upgrade process for DataStax Enterprise provides minimal downtime (ideally zero). During this process, upgrade and restart one node at a time while other nodes continue to operate online. With a few exceptions, the cluster continues to work as though it were on the earlier version of DataStax Enterprise until all of the nodes in the cluster are upgraded.
Follow these instructions to upgrade from DataStax Enterprise 5.0 to DataStax Enterprise 6.0.
If you have DSE 4.8 or earlier, upgrade to the latest version of 5.0 before continuing.
Always upgrade to latest patch release on your current version before you upgrade to a higher version. Fixes included in the latest patch release might help or smooth the upgrade process.
The latest version of DSE 5.0 is 5.0.15.
Apache Cassandra™ version change
- DataStax Enterprise 6.0 is compatible with Cassandra 3.11.
- DataStax Enterprise 5.0 uses Cassandra 3.0.
General recommendations
DataStax recommends backing up your data prior to any version upgrade, including logs and custom configurations. A backup provides the ability to revert and restore all the data used in the previous version if necessary.
General restrictions and limitations during the upgrade process
Restrictions and limitations apply while a cluster is in a partially upgraded state.
With these exceptions, the cluster continues to work as though it were on the earlier version of DataStax Enterprise until all of the nodes in the cluster are upgraded.
- General upgrade restrictions
-
- Do not enable new features.
- Do not run nodetool repair. If you have the OpsCenter Repair Service configured, turn off the Repair Service.
- Ensure OpsCenter compatibility. OpsCenter 6.5 is required for managing DSE 6.0 clusters. See DataStax OpsCenter compatibility with DataStax Enterprise.
- During the upgrade, do not bootstrap or decommission nodes.
- Do not issue these types of CQL queries during a rolling restart: DDL and TRUNCATE.
- NodeSync waits to start until all nodes are upgraded.
- Do not enable Change Data Capture (CDC) on a mixed-version cluster. Upgrade all nodes to DSE 5.1 or later before enabling CDC.
- Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage. Upgrading is not complete until the SSTables are upgraded.
Note: Nodes on different versions might show a schema disagreement during an upgrade.
- Restrictions for DSE Analytic (Spark) nodes
-
- Do not run analytics jobs until all nodes are upgraded.
- Kill all Spark worker processes before you stop the node and install the new version.
- Restrictions for DSE Advanced Replication
-
- Support for DSE Advanced Replication V1 in DSE 5.0 is removed. Because DSE Advanced Replication V2 is substantially revised, for V1 installations, you must first upgrade to DSE 5.1.x and migrate to DSE Advanced Replication to V2, and then upgrade to DSE 6.0.
- Restrictions for DSEFS
- Mixed versions of DSEFS are not supported during the upgrade process.
- Complete the upgrade on all nodes before running fsck. Starting with DSE 5.1.3, all nodes must be able to report proper block status to the node running fsck. If you run fsck on an upgraded node in a mixed version cluster, nodes with versions earlier than DSE 5.1.3 do not properly report block status and cause the fsck to incorrectly assume that data is corrupt or unavailable. The fsck will incorrectly try to repair them.
- A protocol change in DSE 5.1.3 improves efficiency of passing JSON arrays between DSEFS server and client. Upgrade all nodes in the cluster before using the DSEFS shell.
- Restrictions for DSE Graph
- Graph nodes have the same restrictions as the workload they run on. General graph restrictions apply for all nodes, such as not altering graph schema during upgrades. Workload-specific restrictions apply for analytics and search nodes, such as no OLAP queries during upgrades.
- Restrictions for DSE Search
-
- Do not update schemas.
- Do not reindex DSE Search nodes during upgrade.
- DSE 6.0 introduces a new Lucene codec. Segments written with this new codec cannot be read by earlier versions of DSE. To downgrade to earlier versions, the entire data directory for the search index in question must be cleared.
- DSE Search in DataStax Enterprise 6.0 uses Apache Solr 6.0. This
significant change requires advanced planning and specific actions before and
after the upgrade.Important: Before you upgrade DSE Search or SearchAnalytics workloads, you must follow the specific steps in Advanced preparation for upgrading DSE Search and SearchAnalytics nodes.
- Restrictions for nodes using any kind of security
-
- Do not change security credentials or permissions until the upgrade is complete on all nodes.
- If you are not already using Kerberos, do not set up Kerberos authentication before upgrading. First upgrade the cluster, and then set up Kerberos.
- Security changes
- After upgrading, the default authenticator is DseAuthenticator and default authorizer is DseAuthorizer in cassandra.yaml. Other authorizers and authenticators are no longer supported, follow the steps in After the upgrade.
- Upgrading drivers and possible impact when driver versions are incompatible
- Be sure to check driver compatibility. Depending on the driver version, you might need to recompile your client application code. See DataStax driver changes.
Advanced preparation for upgrading DSE Search and SearchAnalytics nodes
Before starting the Preparing to upgrade steps, complete all the advanced preparation steps on DSE Search and SearchAnalytics nodes while DSE 5.0 is still running.
- Schema changes require a full reindex.
- Configuration changes require reloading the core.
- Change HTTP queries to CQL:
- Delete-by-id is removed, use CQL DELETE by primary key instead.
- Delete-by-query no longer supports wildcards, use CQL TRUNCATE instead.
- If any Solr core was created on DSE 4.6 or earlier and never reindexed after being upgraded to DSE 4.7 or later, you must reindex on DSE 5.0 before upgrading to DSE 5.1 or later.
- Ensure that the shard_transport_options in
dse.yaml are set only for
netty_client_request_timeout:
shard_transport_options: netty_client_request_timeout: 60000
In DSE 5.1 and later, the shard transport option supports only netty_client_request_timeout. Remove any other shard_transport_options. - If you are using Apache Solr SolrJ, the minimum required version is 6.0.0.
- For SpatialRecursivePrefixTreeFieldType (RPT) in
search schemas, you must adjust your queries for these changes:
- IsDisjointTo is no longer supported in queries on
SpatialRecursivePrefixTreeFieldType. Replace IsDisjointTo with a NOT Intersects
query. For example:
foo:0,0 TO 1000,1000 AND -"Intersects(POLYGON((338 211, 338 305, 404 305, 404 211, 338 211)))")
- The ENVELOPE syntax is now required for WKT-style queries against
SpatialRecursivePrefixTreeFieldType fields. You must specify
ENVELOPE(10, 15, 15, 10)
, where queries on earlier releases could specify10 10 15 15
.
- IsDisjointTo is no longer supported in queries on
SpatialRecursivePrefixTreeFieldType. Replace IsDisjointTo with a NOT Intersects
query. For example:
- For upgrades to DSE 6.0.0 and DSE 6.0.1
only Stored=true copy fields are not supported and cause schema validation to
fail. The stored=true copyField directive has not been supported since DSE 4.7, so you
probably do not have Stored=true copy fields. If you do:
- Change the stored attribute value of all copyField directives from true to false in the schema.xml file and then use dsetool reload_core to reload the modified schema.
- You must ensure that application designs and implementations recognize this change.
Note: DSE 6.0.2 and later ignores stored=true. - Edit the
solrconfig.xml file and make these changes, as needed:
- Remove these requestHandlers: XmlUpdateRequestHandler, BinaryUpdateRequestHandler,
CSVRequestHandler, JsonUpdateRequestHandler, DataImportHandler. Solr deprecated and
then removed these requestHandlers.
For example:
<requestHandler name="/dataimport" class="solr.DataImportHandler"/>
or
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler"/>
- Change the directoryFactory
from:
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
to<directoryFactory name="DirectoryFactory" class="solr.StandardDirectoryFactory"/>
- <unlockOnStartup> is now unsupported as a result of LUCENE-6508 and SOLR-7942.
- Change the updateLog
from:
<updateLog class="solr.FSUpdateLog" force="false">
to<updateLog force="false">
- Remove these requestHandlers: XmlUpdateRequestHandler, BinaryUpdateRequestHandler,
CSVRequestHandler, JsonUpdateRequestHandler, DataImportHandler. Solr deprecated and
then removed these requestHandlers.
- Upgrading DSE search nodes to DSE
5.1 and later requires replacing unsupported Solr types with supported types.
Note: Special handling is also required for BCDStrField, addressed in 10.Sorting limitations apply to mixed version clusters. Some of the removed Solr types, due to the way they marshal sort values during distributed queries (combined with the way the suggested new types unmarshal sort values), cannot be sorted on during rolling upgrades when some nodes use an unsupported type and other nodes use the suggested new type. The following type transitions are problematic:
Two options are available:Removed Solr field types Supported Solr field types ByteField TrieIntField DateField TrieDateField BCDIntField TrieIntField BCDLongField TrieLongField - Avoid sorting on removed Solr field types until the upgrade to DSE 5.1 or later is
complete for all nodes in the datacenter being queried. Tip: When using two search datacenters, isolate queries to a single datacenter and then change the schema and reindex the other datacenter. Then isolate queries to the newly reindexed datacenter while you change the schema and upgrade the first datacenter.
- If you are using BCDIntField or BCDLongField, update the schema to replace
BCDIntField and BCDLongField with types that are sort-compatible with the supported
Solr types TrieIntField and TrieLongField:
Change the schema in a distributed fashion, and do not reindex. After the schema is updated on all nodes, then go on to 9.Removed Solr field types Interim sort-compatible supported Solr field types BCDIntField SortableIntField BCDLongField SortableLongField
- Avoid sorting on removed Solr field types until the upgrade to DSE 5.1 or later is
complete for all nodes in the datacenter being queried.
- Update the schema and
configuration for the Solr field types that are removed from Solr 5.5 and later.
- Update the schema to replace unsupported Solr field types with supported Solr
field types:
Removed Solr field types Supported Solr field types ByteField TrieIntField DateField TrieDateField DoubleField TrieDoubleField FloatField TrieFloatField IntField TrieIntField LongField TrieLongField ShortField TrieIntField SortableDoubleField TrieDoubleField SortableFloatField TrieFloatField SortableIntField TrieIntField SortableLongField TrieLongField BCDIntField TrieIntField BCDLongField TrieLongField BCDStrField (see 10 if used) TrieIntField - If you are using type mapping version 0, or you do not specify a type
mapper, verify or update the solrconfig.xml to use
dseTypeMappingVersion
1:
<dseTypeMappingVersion>1</dseTypeMappingVersion>
If the Solr core is backed by a CQL table and the type mapping is unspecified, use type mapping version 2. - Reload the
core:
dsetool reload_core keyspace_name.table_name schema=filepath solrconfig=filepath
If you were using the unsupported data types, do a full reindex node-by-node:dsetool reload_core keyspace_name.table_name schema=filepath solrconfig=filepath reindex=true deleteAll=true distributed=false
Note: In DSE 5.1 and later, auto generated schemas use data type mapper 2. - Update the schema to replace unsupported Solr field types with supported Solr
field types:
- If using BCDStrField: In DSE 5.0 and earlier, DSE
mapped Cassandra text columns to BCDStrField. The deprecated BCDStrField was removed in
DSE 5.1.0.The recommended strategy is to upgrade the data type to TrieIntField. However, DSE cannot map text directly to TrieIntField. If you are using BCDStrField, you must complete one of these options before the upgrade.
- If BCDStrField is no longer used, remove the BCDStrField field from the Solr schema. Reindexing is not required.
- If you want to index the field as a TrieIntField, and a full reindex is acceptable, change the underlying database column to use the type int.
- If you want to keep the database column as text and you still want to do simple matching queries on the indexed field, switch from BCDStrField to StrField in the schema. Indexing should not be required, but the field will no longer be appropriate for numeric range queries or sorting, because StrField uses a lexicographic order, not a numeric one.
- Not recommended: If you want to keep the database column as text and
still want to perform numeric range queries and sorts on the former BCDStrField,
but would rather change their application than perform a full reindex:
- Change the field to StrField in the Solr schema with indexed=false.
- Add a new copy field with the type TrieIntField that has its values supplied by the original BCDStrField.
After you make these schema changes, do a rolling, node-by-node reload_core with reindex=true, distributed=false, and deleteAll=true.Note: If you have two datacenters and upgrade them one at a time, reload the core with distributed=true and deleteAll=true. - Tune the schema before you upgrade. For DSE 5.1.4 and later, all field definitions in the schema are validated and must be DSE Search compatible, even if the fields are not indexed, have docValues applied, or used for copy-field source. The default behavior of automatic resource generation includes all columns. To improve performance, take action to prevent the fields from being loaded from the database. Include only the required fields in the schema by removing or commenting out unused fields in the schema.
Advanced preparation for upgrading DSE Graph nodes with search indexes
These steps apply to graph nodes that have search indexes. Before starting the Preparing to upgrade steps, complete these advanced preparation steps while DSE 5.0 is still running.
Upgrading DSE Graph nodes with search indexes requires these edits to the solrconfig file. Configuration changes require reloading the core. Plan sufficient time to implement and test changes that are required before the upgrade.
- Remove these requestHandlers: XmlUpdateRequestHandler, BinaryUpdateRequestHandler,
CSVRequestHandler, JsonUpdateRequestHandler, and DataImportHandler. Solr deprecated and
then removed these requestHandlers.
For example:
<requestHandler name="/dataimport" class="solr.DataImportHandler"/>
or
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler">/>
- <unlockOnStartup> is now unsupported as a result of LUCENE-6508 and SOLR-7942.
- Reload the core so that this configuration change is
respected:
dsetool reload_core keyspace_name.table_name reindex=false
Advanced preparation for upgrading DSE Analytics nodes
Upgrades from DSE 5.0 to DSE 6.0 include a major upgrade to Spark 2.2, as well as a tighter integration between DSE and Spark. For information on Spark 2.2, see the Spark documentation. Spark 2.2 uses Scala 2.11.
- Spark 2.2 uses Scala 2.11. You must recompile all DSE 5.0 Scala Spark applications
against Scala 2.11 and use only Scala 2.11 third-party libraries.
Changing the
dse-spark-dependencies
in your build files is not sufficient to change the compilation target. See the example projects for how to set up your build files. - Spark applications should use
dse://
URLs instead ofspark://spark_master_IP:Spark_RPC_port_number
URLs, as described in Specifying Spark URLs.You no longer need to specify the Spark master IP address or hostname when using
dse://
URLs. Connecting to any Spark node will redirect the request to the master node. - If you have existing Spark application code that uses
spark://Spark master IP:Spark RPC port
to connect, it will no longer work.For example, the following code worked in DSE 5.0 but will not work in DSE 5.1 or later.
val conf = new SparkConf(true) .setMaster("spark://192.168.123.10:7077") .setAppName("cassandra-demo") .set("cassandra.connection.host" , "192.168.123.10") // initial contact .set("cassandra.username", "cassandra") .set("cassandra.password", "cassandra") val sc = new SparkContext(conf)
To connect to DSE 5.1 and later, you no longer need to call
setMaster
. This code will work in DSE 5.1 and later:val conf = new SparkConf(true) .setAppName("cassandra-demo") .set("cassandra.connection.host" , "192.168.123.10") // initial contact .set("cassandra.username", "cassandra") .set("cassandra.password", "cassandra") val sc = new SparkContext(conf)
If you need to specify the master using
setMaster
, use thedse://
URL format. - Starting in DSE 5.1, you can restrict Spark jobs to specific database roles. See Managing Spark application permissions.
- Starting in DSE 5.1, you can set the Spark executor process owners, as described in Running Spark processes as separate users.
- The user submitting the Spark application no longer has to be the same database role. See Specifying Spark URLs for information on how to change the master connection submission to use a different user or cluster than the database connection.
These steps only apply to nodes that use DSEFS. Before starting the Preparing to upgrade steps, complete these advanced preparation steps.
The DSEFS schema used by the database was improved in DSE 5.1, but the old schema is still supported and will not be modified during the upgrade. To use the new DSEFS schema with existing DSEFS data, backup the DSEFS data before upgrading:
Backup the current DSEFS data to local storage using dse hadoop fs
-cp
command:
dse hadoop fs -cp /* /local_backup_location
Preparing to upgrade
- Upgrade to the latest patch release on your current version. Fixes included in the latest patch release can simplify the upgrade process.
- Before upgrading, be sure that each node has ample free disk
space.
The required space depends on the compaction strategy. See Disk space.
- Familiarize yourself with the changes and features in
this release:
- DataStax Enterprise 6.0 release notes.
- General upgrading advice for any version. Be sure to read NEWS.txt all the way back to your current version.
- DataStax Enterprise changes in CHANGES.txt.
- DataStax driver changes.
- Replace ITriggers and custom
interfaces.
Several internal and beta extension points were modified to necessitate core storage engine refactoring. All custom implementations, including the following interfaces, must be replaced with supported implementations when upgrading to DSE 6.0. Because a rewrite of the following interfaces is required for DSE 6.0: (For help contact the DataStax Services team.)
- The
org.apache.cassandra.triggers.ITrigger
interface was modified fromaugment
toaugmentNonBlocking
for non-blocking internal architecture. Updated trigger implementations must be provided on upgraded nodes. If unsure, drop all existing triggers before upgrading. To check for existing triggers:SELECT * FROM system_schema.triggers
- The
org.apache.cassandra.index.Index
interface was modified to comply with the core storage engine changes. Updated implementations are required. If unsure, drop all existing custom secondary indexes before upgrading, except DSE Search indexes, which do not need to be replaced. To check for existing indexes:SELECT * FROM system_schema.indexes
- The
org.apache.cassandra.cql3.QueryHandler
,org.apache.cassandra.db.commitlog.CommitLogReadHandler
, and other extension points have been changed. See QueryHandlers.
- The
- Support for Thrift-compatible tables (COMPACT STORAGE) is
dropped. Before upgrading, follow the steps to migrate all tables that have COMPACT STORAGE to CQL table format
while DSE 5.x.x is running. Note: Do not migrate system.* tables, COMPACT STORAGE is removed by DSE internally.For DSE Analytics, drop compact storage from all the tables in the
HiveMetaStore
andPortfolioDemo
keyspaces.After COMPACT STORAGE is dropped, columns to support migration to CQL-compatible table format are added as described in migrating from compact storage.Attention: DSE 6.0 will not start if COMPACT STORAGE tables are present. Creating a COMPACT STORAGE table in a mixed-version cluster is not supported. Driver connections to the latest DSE 5.0.x and DSE 5.1.x run in a "NO_COMPACT" mode that causes compact tables to appear as if the compact flags were already dropped, but only for the current session. - If audit logging is configured to use CassandraAuditWriter, run these commands as
super user on DSE 5.0 nodes, and then ensure that the entire cluster has schema
agreement:
ALTER TABLE dse_audit.audit_log ADD authenticated text; ALTER TABLE dse_audit.audit_log ADD consistency text;
- Upgrade the SSTables on each node to
ensure that all SSTables are on the current version.
nodetool upgradesstables
This step is required for DataStax Enterprise upgrades that include a major Cassandra version changes.Warning: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage.If the SSTables are already on the current version, the command returns immediately and no action is taken.
- Verify the Java runtime
version and upgrade to the recommended
version.
java -version
- Recommended. OpenJDK 8 (1.8.0_151 minimum) Note: Recommendation changed due to the end of public updates for Oracle JRE/JDK 8. See Oracle Java SE Support Roadmap.
- Supported. Oracle Java SE 8 (JRE or JDK) (1.8.0_151 minimum)
Important: Although Oracle JRE/JDK 8 is supported, DataStax does more extensive testing on OpenJDK 8. - Recommended. OpenJDK 8 (1.8.0_151 minimum)
- Run nodetool repair to ensure that data on each replica is consistent with data on other nodes.
- Install the libaio package for optimal
performance. RHEL platforms:
sudo yum install libaio
Debian:sudo apt-get install libaio1
- DSE Analytics nodes:
- Support for
Thrift-compatible tables (COMPACT STORAGE) is dropped. Before upgrading, follow
the steps to migrate all tables that have COMPACT
STORAGE to CQL table format while DSE 5.x.x is running. Note: Do not migrate system.* tables, COMPACT STORAGE is removed by DSE internally.For DSE Analytics, drop compact storage from all the tables in the
"HiveMetaStore"
andPortfolioDemo
keyspaces.After COMPACT STORAGE is dropped, columns to support migration to CQL-compatible table format are added as described in migrating from compact storage.Attention: DSE 6.0 will not start if COMPACT STORAGE tables are present. Creating a COMPACT STORAGE table in a mixed-version cluster is not supported. Driver connections to the latest DSE 5.0.x and DSE 5.1.x run in a "NO_COMPACT" mode that causes compact tables to appear as if the compact flags were already dropped, but only for the current session. - If you programmatically set the shuffle parameter, you must change
the code for applications that use
conf.set("spark.shuffle.service.port", port)
. Instead, use dse spark-submit which automatically sets the correct service port based on the authentication state. See Configuring Spark for more information. - If DSEFS is enabled, copy CFS hivemetastore directory to
dse:
DSE_HOME/bin/dse hadoop fs -cp cfs://127.0.0.1/user/spark/warehouse/ dsefs://127.0.0.1/user/spark/warehouse/
After upgrade is complete migrate Spark SQL tables (if used) to the new Hive metastore format:dse client-tool spark metastore migrate --from 5.0.0 --to 6.0.0
- Cassandra File System (CFS) is removed. Remove the
cfs
andcfs_archive
keyspaces before upgrading. See the From CFS to DSEFS blog post and the Copying data from CFS to DSEFS documentation for more information.
- Support for
Thrift-compatible tables (COMPACT STORAGE) is dropped. Before upgrading, follow
the steps to migrate all tables that have COMPACT
STORAGE to CQL table format while DSE 5.x.x is running.
- DSE Search nodes:
- DSE Search in DataStax Enterprise 6.0 uses Apache Solr™ 6.0. Complete all of the steps in Advanced preparation for upgrading DSE Search and SearchAnalytics nodes.
- Ensure all use of HTTP writes are changed to use CQL commands for updates and inserts.
- Edit the search index config and make these
changes, as needed. See Search index config for valid options
to change query behavior for search indexes.
- Remove the unsupported dataDir option. You can still set the location of search indexes.
- Remove mergePolicy, maxMergeDocs, and mergeFactor. For
example:
Use mergePolicyFactory instead, and add mergeScheduler:<mergeFactor>25</mergeFactor> <maxMergeDocs>... <mergePolicy>...
<mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"> <int name="maxThreadCount">16</int> <int name="maxMergeCount">32</int> </mergeScheduler> ... <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">10</int> <int name="segmentsPerTier">10</int> </mergePolicyFactory>
- Remove any instance of ExtractingRequestHandler.
- Remove DSENRTCachingDirectoryFactory.
Change:
<directoryFactory name="DirectoryFactory" class="com.datastax.bdp.search.solr.DSENRTCachingDirectoryFactory"/>
to:<directoryFactory name="DirectoryFactory" class="solr.StandardDirectoryFactory"/>
- Ensure
that the catalina.properties and
context.xml files are present in the Tomcat
conf dir. DSE will not start after upgrade if these
files are missing. The default location of the Tomcat conf directory depends on the type of installation:
- Package installations: /etc/dse/tomcat/conf
- Tarball installations: installation_location/resources/tomcat/conf
- If earlier DSE versions use a
custom configuration for the Solr UI web.xml,
change:
<filter-class>com.datastax.bdp.search.solr.auth.DseAuthenticationFilter</filter-class>
to<filter-class>com.datastax.bdp.cassandra.auth.http.DseAuthenticationFilter</filter-class>
- StallMetrics MBean is removed. Change operators that use the MBean.
- DSE Graph nodes:
- If your graph nodes have search indexes that you added with gremlin, complete the steps in Advanced preparation for upgrading DSE Graph nodes with search indexes.
- Ensure that edge label names and
property key names use only the supported characters. Edge label names and
property key names allow only [a-zA-Z0-9], underscore, hyphen, and period.
In earlier versions, edge label names and property key names allowed nearly
unrestricted Unicode.
- schema.describe() displays the entire schema, even if it contains illegal names.
- In-place upgrades allow existing schemas with invalid edge label names and property key names.
- Schema elements with illegal names cannot be updated or added.
- Back up the configuration files you use to a folder
that is not in the directory where you normally run commands.
The configuration files are overwritten with default values during installation of the new version.
- Upgrades from 5.0.0 to 5.0.8 and from DSE 5.1.0 and 5.1.1 to
DSE 5.1.2 and later releases
The messaging protocol version in DSE 5.1.2 has been changed to VERSION_3014. Schema migrations rely on exact messaging protocol versions. To accommodate schema changes that might occur during the upgrade, force a backward compatible messaging protocol.
Before you upgrade, restart the node with this start-up parameter:
For example:-Dcassandra.force_3_0_protocol_version=true
installation_location/bin/dse cassandra -Dcassandra.force_3_0_protocol_version=true
Note: While mixed versions exist during the upgrade, do not add or remove columns from existing tables.After the upgrade is complete on all nodes, restart nodes without this flag.
Upgrade steps
- DSE Analytics nodes: Kill all Spark worker processes.
- To flush the commit log of the old
installation:
nodetool -h hostname drain
This step saves time when nodes start up after the upgrade and prevents DSE Search nodes from having to reindex data.Important: This step is mandatory when upgrading between major Cassandra versions that change SSTable formats, rendering commit logs from the previous version incompatible with the new version. - Stop the node. See Stopping a DataStax Enterprise node.
- To stop DataStax Enterprise running as a
service:
sudo service dse stop
- To stop DataStax Enterprise running as a stand-alone
process:
bin/dse cassandra-stop
- To stop DataStax Enterprise running as a
service:
- Use the appropriate method to install the new product
version on a supported platform:Note: Install the new product version using the same installation type that is on the system, otherwise problems might result.
- To configure
the new product version:
- Compare your backup configuration files to the
new configuration files:
- Look for any deprecated, removed, or changed settings.
- DSE Search nodes
- While the
node is down, edit dse.yaml and remove
these options:
- cql_solr_query_executor_threads
- enable_back_pressure_adaptive_nrt_commit
- max_solr_concurrency_per_core
- solr_indexing_error_log_options
- While the
node is down, edit dse.yaml and remove
these options:
- DSE Analytics nodesNote: Although DSEFS is enabled by default in DSE 5.1.0 and later, the dsefs.enabled setting is commented out in dse.yaml. To enable DSEFS, uncomment the dsefs_options.enabled setting. (DSP-13310)
- DSE Search nodes
- The upgrade installs a new server.xml for Tomcat 8. If your existing server.xml has custom connectors, migrate those connectors to the new server.xml before starting the upgraded nodes.
- Be sure you are familiar with the Apache Cassandra and DataStax Enterprise changes and features in the new release.
- Ensure that keyspace replication factors are
correct for your environment:
- Check the keyspace replication factor for analytics keyspaces.
- Check the keyspace replication factor for system_auth and dse_security keyspaces.
- Look for any deprecated, removed, or changed settings.
- Compare your backup configuration files to the
new configuration files:
- DSE Analytics nodes: If your DSE 5.0 clusters had any
datacenters running in Analytics Hadoop mode and if the DseSimpleSnitch was used, you
must do one of these:
- For nodes in the datacenters running in Analytics Hadoop mode, start those nodes in Spark mode.
- Add the special start-up parameter
-Dcassandra.ignore_dc=true
for each node, then start in cassandra mode. This flag is required only once after upgrading. Subsequent restarts do not use this flag. You can leave the flag in the configuration file or remove it after the first restart of each node.
- Start the node.
- Package installations: See Starting DataStax Enterprise as a service.
- Tarball installations: See Starting DataStax Enterprise as a stand-alone process.
- Verify that the upgraded datacenter names match the
datacenter names in the keyspace schema
definition:
nodetool status
- Review the logs for
warnings, errors, and exceptions.
Warnings, errors, and exceptions are frequently found in the logs when starting an upgraded node. Some of these log entries are informational to help you execute specific upgrade-related steps. If you find unexpected warnings, errors, or exceptions, contact DataStax Support.
- Repeat the upgrade on each node in the cluster following the recommended order.
- When the upgrade includes a major Cassandra version,
you must upgrade the SSTables. DataStax recommends upgrading the SSTables on one
node at a time or when using racks, one rack at a time. Warning: Failure to upgrade SSTables when required results in a significant performance impact and increased disk usage. Upgrading is not complete until the SSTables are upgraded.
nodetool upgradesstables
If the SSTables are already on the current version, the command returns immediately and no action is taken. See SSTable compatibility and upgrade version.
Use the
--jobs
option to set the number of SSTables that upgrade simultaneously. The default setting is 2, which minimizes impact on the cluster. Set to 0 to use all available compaction threads. DataStax recommends running theupgradesstables
command on one node at a time or when using racks, one rack at a time.Note: You can run theupgradesstables
command before all the nodes are upgraded as long as you run this command on only one node at a time or when using racks, one rack at a time. Runningupgradesstables
on too many nodes will degrade performance.
Recovery after upgrading to DSE 6.0 without dropping compact storage
- Downgrade any nodes which were already upgraded to DSE 6.0 to the latest version in
the DSE 5.0 or 5.1 series:
- DSE 5.0.x, downgrade to 5.0.15 or later
- DSE 5.1.x, downgrade to 5.1.12 or later
- On each node that was attempted to be started on DSE 6.0, start DSE with the
-Dcassandra.commitlog.ignorereplayerrors=true
option. - On one node (any node) in the cluster, DROP COMPACT STORAGE from tables which use it.
- Restart DSE to continue the upgrade to DSE 6.0.
After the upgrade
After all nodes are upgraded and running on DSE 6.0, complete these steps:
- If you use the OpsCenter Repair Service, turn on the Repair Service.
- After all nodes are on DSE 6.0 and the required schema change occurs, the new audit logging feature (CassandraAuditWriter) enables the use of new columns.
- Drop the following legacy tables, if they exist: system_auth.users,
system_auth.credentials, and system_auth.permissions.
As described in General upgrade advice, authentication and authorization subsystems now support role-based access control (RBAC).
- Review your security configuration. To use security,
enable and configure DSE Unified Authentication.
In cassandra.yaml, the default authenticator is DseAuthenticator and the default authorizer is DseAuthorizer. Other authenticators and authorizers are no longer supported. Security is disabled in dse.yaml by default.
- TimeWindowCompactionStrategy (TWCS) is set only on new dse_perf tables. Manually
change dse_perf tables that were created in earlier releases to use TWCS. For
example:
ALTER TABLE dse_perf.read_latency_histograms WITH COMPACTION={'class':'TimeWindowCompactionStrategy'};
- DSE Search only:
- The appender SolrValidationErrorAppender and the logger SolrValidationErrorLogger are no longer used and may safely be removed from logback.xml.
- In contrast to earlier versions, DataStax recommends accepting the new default value of 1024 for back_pressure_threshold_per_core in dse.yaml. See Configuring and tuning indexing performance.
- If
SpatialRecursivePrefixTreeFieldType
(RPT) is used in the search schema, replace the units field type with a suitable (degrees, kilometers, or miles) distanceUnits, and then verify that spatial queries behave as expected. - The appender SolrValidationErrorAppender and the logger SolrValidationErrorLogger are no longer used and may safely be removed from logback.xml.
- Applies only if you are using HTTP writes with JSON documents (deprecated), a known
issue causes auto generated solrconfig.xml to have invalid
requestHandler for JSON core creations after upgrade to 5.1.0. Change the auto
generated
solrconfig.xml:
<requestHandler name="/update/json" class="solr.UpdateUpdateRequestHandler" startup="lazy"/>
to<requestHandler name="/update/json" class="solr.UpdateRequestHandler" startup="lazy"/>
- Slow startup on nodes with large
encrypted indexes is resolved. However, action is required to realize the performance
gains. You must do a full reindex of all encrypted search indexes on each node in your
cluster. Plan sufficient time after the upgrade is complete to reindex with
deleteAll=true in a rolling fashion. For
example:
dsetool reload_core keyspace_name.table_name distributed=false reindex=true deleteAll=true
- DSEFS only:
A new schema is available for DSEFS.Warning: Dropping a keyspace is not recoverable without a backup. If you have non-temporary data, do not drop the dsefs keyspace. No action is required. DSEFS will continue to work using the DSE 5.0 schema.If you have no data in DSEFS or if you are using DSEFS only for temporary data, follow these steps to use the new schema:
- Stop DSEFS on all nodes. In the dsefs_options section of dse.yaml, set enabled: false.
- Restart the DSE node.
- Drop the dsefs
keyspace:
DROP KEYSPACE dsefs
- Clear the dsefs data directories on each node. For example, if the dsefs_options section of dse.yaml has data_directories configured as:
dsefs_options: ... data_directories: - dir: /var/lib/dsefs/data
This command removes the directories:rm -r /var/lib/dsefs/data/*
- Start DSEFS with DSE 6.0 to use the new schema.
- If you backed up existing DSEFS data before the upgrade, copy the data back into DSEFS from local storage.
- DSE Analytics only:
- Spark Jobserver uses DSE custom version 0.8.0.44. Applications must use the compatible Spark Jobserver API from the DataStax repository.
- If you are using Spark SQL tables, migrate them to the new
Hive metastore
format:
dse client-tool spark metastore migrate --from 5.0.0 --to 6.0.0
- Ensure that keyspace replication factors are
correct for your environment:
- Check the keyspace replication factor for analytics keyspaces.
- Check the keyspace replication factor for system_auth and dse_security keyspaces.
Warning messages during and after upgrade
You can ignore some log messages that occur during and after an upgrade.
- When upgrading nodes with DSE Advanced Replication, there might be some WriteTimeoutExceptions during a rolling upgrade while mixed versions of nodes exist. Some write consistency limitations apply while mixed versions of nodes exist. The WriteTimeout issue is resolved after all nodes are upgraded.
- Some
gremlin_server properties in earlier versions of DSE are no longer required. If properties
exist in the dse.yaml file after upgrading, logs display
warnings similar
to:
You can ignore these warnings or modify dse.yaml so that only the required gremlin server properties are present.WARN [main] 2017-08-31 12:25:30,523 GREMLIN DseWebSocketChannelizer.java:149 - Configuration for the org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0 serializer in dse.yaml overrides the DSE default - typically it is best to allow DSE to configure these.
- If you see an error message
like:
Follow upgrade instructions in 6. You must start in Spark mode or add the special start-up parameterERROR [main] 2016-07-21 13:52:46,941 CassandraDaemon.java:737 - Cannot start node if snitch's data center (Cassandra) differs from previous data center (Analytics). Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.
-Dcassandra.ignore_dc=true
for each node.