DataStax Enterprise 6.8 release notes

DataStax Enterprise release notes include cluster requirements, upgrade advice, components, security updates, changes and enhancements, issues, and resolved issues for DataStax Enterprise 6.8.x.

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations installation_location/resources/cassandra/conf/cassandra.yaml

DataStax Enterprise release notes cover cluster requirements, upgrade guidance, components, security updates, changes and enhancements, issues, and resolved issues for DataStax Enterprise (DSE) 6.8.x.

Requirement for Uniform Licensing

All nodes in each DataStax licensed cluster must be uniformly licensed to use the same subscription. For example, if a cluster contains 5 nodes, all 5 nodes within that cluster must be either DataStax Distribution of Apache Cassandra™, or all 5 nodes must be DataStax Enterprise. Mixing different subscriptions within a cluster is not permitted. The DataStax Advanced Workloads Pack may be added to any DataStax Enterprise (not DataStax Distribution of Apache Cassandra) cluster in an incremental fashion. For example, a 10-node DSE cluster may be extended to include 3 nodes of the Advanced Workloads Pack. “Cluster” means a collection of nodes running the software which communicate with one another using gossip. See Enterprise Terms.

Upgrade information

Upgrade advice Compatibility
Before you upgrade to a later major version, upgrade to the latest patch release on your current version. Be sure to read the relevant upgrade documentation. Upgrades to DSE 6.8 are supported from DSE 5.1, DSE 6.0, and DSE 6.7.
Check the compatibility page for your products. See Product compatibility.
See Upgrading DataStax drivers.
Note: Starting January 2020, you can use the same DataStax driver for Apache Cassandra (OSS), DataStax Enterprise, and DataStax Distribution of Apache Cassandra. DataStax has unified the DSE and OSS drivers to avoid user confusion and enhance the OSS drivers with some of the features in the DSE drivers. For more information, see the Better Drivers for Cassandra blog.
DataStax Drivers: You may need to recompile your client application code.
Use DataStax Bulk Loader for loading and unloading data. Loads data into DSE 5.0 or later and unloads data from any Apache Cassandra™ 2.1 or later data source.

What's new in DataStax Enterprise 6.8

  • Four times faster streaming – what took hours now takes minutes. Based on zero copy streaming from Apache Cassandra 4.0, zero copy streaming, DSE 6.8 tackles and tames the increasing complexity and chaos of cloud infrastructure.
    • Four times faster recovery from node failure.
    • Four times faster addition of new nodes to the cluster.
    • Ability to stream whole and partial SSTables.

      By allowing partial SSTables, DSE’s zero copy streaming applies to more use cases, especially STCS or TWCS, where partial SSTable streaming is the most common use case.

  • DataStax Graph
    • Graph-optimized data model. Graph implemented as a native extension of Cassandra’s data model.

    • Graph-specific API. Enables developers to more easily join, explore, match, and traverse distributed, large-scale data sets.

  • Other Enhancements
    • Anti-Entropy: Incremental NodeSync.

      Enabled by default when creating new table.

    • Security enhancements:
      • Allow setting of pre-hashed passwords using CQL.
      • New TRUNCATE and UPDATE Permissions.
      • Encryption on the SSTable Partition Index.
    • User productivity:
      • Allow filtering using IN restrictions
      • Faster DSE Tools startup.
    • Guardrails. This optional feature helps you avoid mistakes and prevents implementing known anti-patterns. Each guardrail are enabled individually. For a complete list of guardrails and a detailed description, see Guardrails in the cassandra.yaml file.
    • DSEFS:
      • Provides more reliable startup and shutdown.
      • Improved DSEFS directory delete performance; skips the recursive delete check on DSEFS.
    • DSE Analytics:
      • Spark 2.4.
      • Ability to supply TTL and WriteTime based on Column in a DataFrame.
    • DSE Search. Removes legacy Solr Join syntax for non partition key JOINS.
    • Performance improvement: Reduces chunk cache heap overhead.
  • Deprecated or removed functionality

    Deprecated functionality will be removed in a future version.

    • Classic DSE Graph, replaced by DataStax Graph.
    • DSE Graph Loader. DataStax Bulk Loader 1.5 and later loads graph data.
    • In Memory (deprecated).
    • Multi-Instance (deprecated).
    • Tiered Storage (deprecated).

DSE 6.8.0 release notes

Components, changes and enhancements, resolved issues, and known issues for DSE 6.8.0.

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations installation_location/resources/cassandra/conf/cassandra.yaml

dse.yaml

The location of the dse.yaml file depends on the type of installation:
Package installations /etc/dse/dse.yaml
Tarball installations installation_location/resources/dse/conf/dse.yaml

30 March 2020

Attention: DSE Search and DSE Graph performance variability can result after upgrades from DSE 5.1 to DSE 6.0 and later, especially if environments do not adhere to the DataStax hardware layout recommendations.

The DSE Advanced Performance feature introduced in DSE 6.0 included a fundamental architecture change. Performance is highly dependent on data access patterns and varies from customer to customer. This upgrade impact affects only DataStax customers using DSE Search and/or DSE Graph.

In response to this scenario:

  • DataStax has extended DSE 5.1 end of life (EOL) support to April 18, 2024.
  • DataStax is offering a free half-day Upgrade Assessment. This assessment is a DataStax Services engagement designed to assess the upgrade compatibility of your DSE 5.1 deployment. If you are using DSE 5.1 and plan to upgrade to DSE 6.0 or later, contact the DataStax Services Team to schedule your complimentary assessment.
  • DataStax continues to investigate performance differences related to DSE Search and DSE Graph that occur after some upgrades to DSE 6.0 and later. Additional details have been and will continue to be included in DSE release notes.

6.8.0 Components

  • Apache Solr™ 6.0.1.4.2718
  • Apache Spark™ 2.4.0.13
  • Apache TinkerPop™ 3.4.5 with additional production-certified changes
  • Apache Tomcat® 8.0.53
  • DSE Java Driver 1.10.0-dse+20200217
  • Netty 4.1.25.6.dse
  • Spark Jobserver 0.8.0.49 (DSE custom version)
  • DSE 6.8 is based on Apache Cassandra™ 3.11 with additional production-certified changes.

DSE 6.8.0 is compatible with Apache Cassandra® 3.11 and adds additional production-certified changes.

Along with Cassandra 3.11.6, DSE 6.8.0 is supported by a new product offering, DataStax Kubernetes Operator for Apache Cassandra.

Experimental features

DataStax Labs provides the Apache Cassandra and DataStax communities with non-supported previews of potential production software enhancements, tools, aids, and partner software designed to increase productivity. See DataStax Labs and DSE OpsCenter Labs features.

6.8.0 DSE database

Changes and enhancements:

  • Beta version of Storage-Attached Indexing (SAI), which provides traditional relational database indexing capabilities for DSE, without complexity or operational challenges. Includes the revised CREATE CUSTOM INDEX command with a new USING 'StorageAttachedIndex' clause. See the SAI topics and the CREATE CUSTOM INDEX command.
  • Beta version of the DSE Backup and Restore Service, which enables cluster-wide backup and restore operations. Intended for users of DataStax Kubernetes Operator for Apache Cassandra® (Cassandra Operator).
  • Performance improvement: Better memtable memory usage. (DB-2164)
  • DataStax recommends enabling NodeSync on base tables with materialized views. See Materialized views maintenance guidelines. (DB-2831)
  • Simplified maintenance of materialized views by requiring periodic repair of only base tables. Updates on columns indexed in materialized views might see a performance regression of up to 20% in throughput and latency. (DB-2816)
  • New client requests rate and internode message rate usage statistics are output from nodetool tpstats. (DB-3619)
  • Fixed a rare race during streaming that could have resulted in aborted SSTable files being left on disk. (DB-2733)
  • Compaction performance improvement with new cassandra.yaml pick_level_on_streaming option. (DB-1658)
  • Incremental NodeSync service on newly created tables is now on by default. (DB-3934)
    • Existing tables that did not have NodeSync service enabled in DSE 6.7 and earlier are not affected after upgrade.
    • Existing tables with NodeSync service enabled in DSE 6.7 and earlier retain incremental NodeSync service enabled after upgrade.
    • To disable NodeSync service for all new tables, use the -Ddse.nodesync.disable_on_new_tables system parameter.
    • To disable NodeSync for a single table, use the table property:
      ALTER TABLE table_name WITH nodesync = { 'enabled' : 'false' }
  • Introduce new TRUNCATE permission. (DB-74)
    • Deprecate MODIFY permission.
    • New UPDATE permission allows DML statements INSERT, UPDATE, and DELETE.
    • New TRUNCATE permissions allows execution of TRUNCATE statement.
    DataStax recommends migrating from MODIFY permission. Thoroughly plan and implement this migration with care in case a ROLE needs the TRUNCATE permission.
  • Create new settings system view by merging CASSANDRA-14573. Allows select properties to be modified. (DB-37)
    Column Name CQL Type Column Type Description
    name text partition key The property name
    value text regular The property value as text
    writeable boolean regular true if the property can be updated, false otherwise
  • In cassandra.yaml, memtable_flush_writers now defaults to 8 and memtable_cleanup_threshold now defaults to max(0.2, 1 / (memtable_flush_writers + 1)) to establish a lower bound. PendingTasks metric attached to all JMXEnabledThreadPoolExecutor instances, including MemtableFlushWriter, now excludes currently executing tasks. (DB-2871)
  • Fix error that occurs when initialCapacity overflows during continuous paging. (DB-2852)
  • dsetool ring shows in-progress search index building during bootstrap. (DSP-15281)
  • Improved output of nodetool tablestats to accurately report when no data is available. (DB-2854)
  • New cqlsh command line option to set consistency-level to serial. (DB-4225)
  • Remove the serialization header partition/clustering key validation for secondary-index sstables and changes the validation to consider the element type. (DB-4111)
  • Keyspace and table names now allow 222 characters. (DB-2128)
  • Fix viewbuildstatus error that returns incorrect value. (DB-2397)
  • nodetool stop has anticompaction option. (DB-3821)
  • New ALL TABLES permission allows permissions on all of the tables of a keyspace, but not the keyspace itself. (DB-3823)
  • Log messages for leak detection now specify if the leaked resource has negative consequences. (DB-3827)
  • Fix potentially incorrect dropped messages in case of time drifts. (DB-3891)
  • Change in validation method for initial_token and num_tokens for cassandra.yaml. (DB-1618)
  • Fix a bug to correctly estimate the size of the on-heap memtable metadata. (DB-2086)
  • Starting from DSE 6.8.0, if nodes need to communicate via a different interface than the one configured as the listen_address, you must configure the additional interface as the broadcast_address. If no routing exists between the two interfaces, you need to eventually set listen_on_broadcast_address: true. (DB-4142)
  • Fix bug to allow LIST ROLES and LIST USERS to work with system-keyspace-filtering enabled. (DB-4221)
  • nodetool compactionstats now accurately shows the pending tasks for TimeWindowCompactionStrategy (TWCS). (DB-3495)
  • DROP KEYSPACE now waits for interrupted compactions to finish before dropping the keyspace. (DB-3575)
  • During the execution of CQL queries, check guardrails are ignored if the user is a superuser or for internal system queries. (DB-3654)
  • Replaced stream_entire_sstables for Cassandra with zerocopy_streaming_enabled for DataStax Enterprise, along with the related options. (DB-3832)
  • Improved handling of SSTable min and max clustering. (DB-3728)
  • For cassandra.yaml configuration, only TLS is allowed for the protocol option for client_encryption_options and server_encryption_options for security. (DB-2786)
  • New counter metrics for submitted traversals and throughput are under com.datastax.bdp.metrics.graph. (DSP-17009)
  • Improved performance of dsetool, dse client-tool, dse fs, and nodetool commands. (DSP-17586)
  • DataStax Bulk Loader is not included with DataStax Enterprise installations, but can be installed separately. (DSP-19469)
  • The application_name and application_version attributes set by the Python driver flow through to driver event reporting and Insights. cqlsh sets these attributes to appropriate values so connected clients appear neatly in Insights and other reporting. (DSP-20119)
  • Incremental Nodesync is disabled when an ordering partitioner configured. (DB-4024)
  • Added hostname_verification to ldap_options in dse.yaml. (DSP-20302)
  • Add the ability to add the reason for re-indexing to the dsetool core_indexing_status command. (DSP-20264)
  • Make the search reference visible in the error message for LDAP connections. (DSP-20578)
  • Security updates:
    • Allow setting of pre-hashed passwords via CQL. (DB-3293)
    • Fix min/max clustering keys being stored in plain text in SSTable statistics. (DB-3845)
    • Upgrade Apache Solr to address CVE-2018-8026. (DSP-16653)
    • Upgrade Jackson Databind to address CVE-2018-11307 and CVE-2019-14540 (DSP-18099, DB-2911, DSP-17964)
    • Apache Spark local privilege escalation vulnerability: CVE-2018-11760. (DSP-18225)
    • Upgrade spray-json to prevent Denial Of Service (DoS) vulnerability CVE-2018-18854 and CVE-2018-18853. (DSP-19208)
    • Upgrade Apache MINA Core library to 2.0.21 to prevent a security issue where Apache MINA Core was vulnerable to information disclosure. (DSP-19213)
    • Upgrade Jackson Databind to address CVE-2019-16942. (DSP-19896)
    • Remove Jodd Core dependency that created vulnerability to Arbitrary File Writes. (DSP-19206)
Known issues:
  • Streaming throughput (bootstrapping, decommissioning nodes) may be slower on networks with high latency. (DB-4041)

    Workaround: Expand streaming buffers setting the following in cassandra.yaml:

    internode_recv_buff_size_in_bytes=174760

    internode_recv_buff_size_in_bytes=174760

  • Certain workloads may cause higher context switches and increased latencies. (DSP-20499)
    Workaround: Set system properties during startup:
    -Ddse.tpc.work_stealing_max_unparks=1
  • NoSuchMethod error returned when creating SASI index. (DSP-20720)

6.8.0 DSE Analytics

Changes and enhancements:
  • Improved reliability of no-space-left-on-device detection. The DSEFS min_free_space default value in dse.yaml is reduced from 5 GB to 256 MB. (DSP-16873)
  • Apache Spark™ 2.4 runs with Scala 2.11.12 by default. Upgrade the compile time dependencies for structured streaming and other experimental Spark features. Even though most Spark jobs from earlier Spark 2.x builds can run on Spark 2.4 without recompiling, DataStax recommends that you recompile your applications against Spark 2.4 to guarantee compatibility. (DSP-17823)
  • Add the ability to set time-to-live (TTL) and WriteTime in DseGraphFrames and Spark DataFrames. (DSP-17044)
  • Bring-Your-Own-Spark (BYOS) builds include dependencies for Joda and Commons-Configuration. (DSP-20512)
  • During Spark Application startup, Exception: java.lang.ExceptionInInitializerError thrown from the UncaughtExceptionHandler in thread "main" was logged, sometimes instead of a meaningful error. (DSP-20474)
  • New spark.cassandra.query.consistency.level parameter sets the default consistency level for sessions accessed by Spark Connector. The default consistency level for HiveMetaStore is LOCAL_QUORUM. (DSP-19982)
  • Changes to IN clauses. (DSP-15203)
    • Multiple IN clauses on partition and clustering keys can be pushed down to Cassandra.
    • If cross product of values in IN clauses exceeds spark.sql.dse.inClauseToJoinConversionThreshold, then JoinWithCassandraTable is performed instead.
    • If cross product of values in IN clauses exceeds spark.sql.dse.inClauseToFullScanConversionThreshold, then full table scan is performed instead.

6.8.0 DSEFS

Changes and enhancements:

  • Running Spark applications with large number of partitions creates many tombstones and may cause tombstone warnings or in extreme cases a job failure. DSE 6.8 reduces the number of tombstones created during Spark job commit and improves performance of some Spark jobs up to 60%. (DSP-15762)
  • When creating a file through WebHDFS API, DSEFS does not verify WX permissions of parent's parent when and the parent exists. (DSP-20355)
  • Allow DSEFS to use mixed case keyspaces to connect directly to the dsefs keyspace. (DSP-20354)
  • DSEFS node identifiers are now the same as DSE node identifiers. The NODE_ID file in the dsefs working directory is no longer needed. (DSP-18009)
  • Add support for multiple contact points for DSEFS implementation of the Hadoop FileSystem. Provides FileSystem URI with dsefs://host0[:port][,host1[:port]]/. ( DSP-19704)
  • DSEFS local file system implementation now returns alphabetically sorted directories and files when using wildcards and listing command. (DSP-20057)
  • DSEFS now stores information about data usage in the local storage directory instead of Cassandra. This change improves reliability if some nodes are down. (DSP-15349)
  • Improve reliability of DSEFS internode connections. Fix error for missing session key when cluster nodes were down. (DSP-15347)
  • Improved DSEFS node health reported by dsefs df for consistency with dsetool status and nodetool status. (DSP-15346)

6.8.0 DSE Graph

Changes and enhancements:

  • Traversal length is hard-coded to 90 steps and traversals with more steps fail with an error message to split it into multiple smaller traversals. (DSP-17657)
  • Enhanced Graph OLAP Spark configuration. (DSP-17832)
    • New Spark configuration properties in resources/graph/conf/olap.properties.
    • New dse client-tool graph-olap commands.
  • DataStax Graph (core) changes:
    • Changed delimiters and a checksum make IDs more readable and removes the need to manually construct IDs. (DSP-15963)
    • Support for user-defined types (UDTs) is added. (DSP-16030)
    • System is now available while aliased. Alias for a missing graph now allows a user to issue commands. Exceptions occur when accessing 'g' (or the alias), 'graph', or 'schema'. (DSP-16682)
    • .withReplication is mandatory. Classic is no longer a valid engine to create a graph using the new syntax. (DSP-16698)
    • Timeouts from dse.yaml are still valid. However, there is no Gremlin-exposed traversal source or graph configuration. To change timeouts, set them on the driver connection. Timeouts cannot exceed limits set in dse.yaml. (DSP-16758)
    • Improved user experience when authorization errors occur. (DSP-18125)
    • Fixed Graph cleanUp failures due to duplicate properties on dangling edges. (DSP-20460)
  • Track latencies for unaliased, global, and per-graph traversals. (DSP-16455)
  • Upgraded dependency for Graph prototyping to TinkerPop 3.4.0. (DSP-16452)
    • Introduces with() step modulator which will enable DSE Graph to modify step behaviors as in g.addV('person').with(ttl, 1000)
    • Removes deprecated rebindings option; older drivers going back to 3.1.x can no longer connect.
    • min() and max() work on any Comparable to allow for g.V().values('name').min().
    • Reduces barrier steps (min(), max(), mean(), sum()) so no result is returned if there is no input rather than NaN or 0, which could lead to unintuitive results.
    • Changes the order of select() scopes to make it easier to select a specific map entry if a side-effect existed with the same name.
  • The AndStep, OrStep, DedupGlobalStep, RangeGlobalStep, NotStep, SelectStep, SelectOneStep, OrderGlobalStep, and WherePredicateStep steps are valid when determining whether to route OLAP traversals to DseGraphFrames. (DSP-16233)
  • Allow range queries using part of the Custom Vertex ID. This change requires the partition key components to be specified before the range query may be specified on the clustering key components. (DSP-12501)
  • Add read/write support for TTL and WriteTime into DataGraphFrames (DGF). (DSP-19304)
  • Expose configuration and metrics for Gremlin query cache. (DSP-20240)
  • Change classic Graph query so vertices are read from _p tables in Cassandra using SELECT ... WHERE <vertex primary key columns> statement. The search predicate is applied in memory. (DSP-20230)
  • Update TinkerPop version bump, which changes the following settings if SSL configuration options are used for Gremlin Console and TinkerPop drivers:
    • Added:
      • keyStore
      • keyStorePassword
      • trustStore
      • trustStorePassword
      • keyStoreType
      • sslEnabledProtocols
      • sslCipherSuites
      • sslSkipCertValidation
    • Deprecated:
      • trustCertChainFile
      • keyCertChainFile
      • keyFile
      • keyPassword
    (DSP-17552)
Known issues:
  • Server could get slow or unresponsive if lots of long-running traversals are canceled. (DSP-20425)

    Workaround: Run with shorter traversal timeouts, if traversal allows.

6.8.0 DSE Search

Changes and enhancements:
  • Unbounded facet searches are no longer allowed. (DSP-18693)
    • facet.limit < 0 is no longer supported. Override the default facet.limit of 20000 with the -Dsolr.max.facet.limit.size system property.
    • This change adds guardrails that can cause misconfigured faceting queries to fail. Before upgrading, set an explicit facet.limit.
  • The dsetool stop_core_reindex command now mentions the node in the output message. (DSP-17090)
  • Legacy Solr join queries are no longer valid. The to, from, and force parameters are invalid. Joins can no longer be performed on non-partition key columns or on different keyspaces. (DSP-17431)
  • The dsetool core_indexing_status command now mentions the indexing reason in the output message.
  • The recommendation to enable live indexing on only one search core per cluster was too conservative. See Tuning search for maximum indexing throughput and Capacity planning for DSE Search. Be sure to follow the DataStax recommendations for your environment. (DSP-17939)
  • DSE Management API is available for enhancing operation with Kubernetes. (DSP-18785)
  • Improved real-time search to fix a docValues bug. (DSP-20300)
  • Passing TextField Solr fields with docValues to facet.field, facet.pivot, group.field, and sort (including native CQL Solr queries that use a TextField with docValues with ORDER BY) is now illegal and will fail the query in question. (DSP-18238)
  • Improved guidance with warnings when index rebuild is required for ALTER SEARCH INDEX, RELOAD SEARCH INDEX, and dsetool reload_core commands. (DSP-19347)
  • Replicas with non-queryable indexes will be skipped by the coordinator node to improve availability for index read. During nodetool rebuild_index, Storage-Attached Indexing (SAI) will be marked as non-queryable until the index build finishes, while 2i will remain queryable. (DSP-19543) For related information, see What is SAI?.
  • Error messages related to Solr errors contain better descriptions of the root cause. (DSP-13792)
Known issues:
  • Mixed workloads with very wide partitions could see diminished performance. (DSP-20386)
    Workaround: Set system property in startup:
    -Dnetty.eventloop.tasks_processing_time_limit_ms=100

DataStax Studio

DataStax Bulk Loader

Cassandra enhancements for DSE 6.8.0

DataStax Enterprise 6.8.0 is compatible with Apache Cassandra™ 3.11 and adds production-certified enhancements.

DataStax Enterprise 6.8.0 is compatible with Apache Cassandra™ 3.11 and adds these production-certified enhancements:

General upgrade advice for DSE 6.8.0

General upgrade advice for DataStax Enterprise 6.8.0

DataStax Enterprise 6.8.0 is compatible with Apache Cassandra™ 3.11. All upgrade advice from previous versions applies. Carefully reviewing the DataStax Enterprise upgrade planning and upgrade instructions can ensure a smooth upgrade and avoid pitfalls and frustrations.

DataStax Enterprise 6.8.0 is compatible with Apache Cassandra™ 3.11 and adds Cassandra enhancements for DSE 6.8.0.

Additional advice for upgrading between versions of Apache Cassandra™ includes:

Cassandra 4.0 changes

Cassandra 3.11.2 changes

  • Cassandra is now relying on the JVM options to properly shutdown on OutOfMemoryError. By default it will rely on the OnOutOfMemoryError option as the ExitOnOutOfMemoryError and CrashOnOutOfMemoryError options are not supported by the older 1.7 and 1.8 JVMs. A warning will be logged at startup if none of those JVM options are used. See CASSANDRA-13006 for more details.

Cassandra 3.11.2 upgrade considerations

  • Creating Materialized View with filtering on non-primary-key base column (added in CASSANDRA-10368) is disabled, because the liveness of view row is depending on multiple filtered base non-key columns and base non-key column used in view primary-key. This semantic cannot be supported without storage format change, see CASSANDRA-13826. For append-only use case, you may still use this feature with a startup flag: "-Dcassandra.mv.allow_filtering_nonkey_columns_unsafe=true"
  • The NativeAccessMBean isAvailable method will only return true if the native library has been successfully linked. Previously it was returning true if JNA could be found but was not taking into account link failures.
  • Primary ranges in the system.size_estimates table are now based on the keyspace replication settings and adjacent ranges are no longer merged (CASSANDRA-9639).
  • In 2.1, the default for otc_coalescing_strategy was 'DISABLED'. In 2.2 and 3.0, it was changed to 'TIMEHORIZON', but that value was shown to be a performance regression. The default for 3.11.0 and newer has been reverted to 'DISABLED'. Users upgrading from Cassandra 2.2 or 3.0 should be aware that the default has changed.
  • The StorageHook interface has been modified to allow to retrieve read information from SSTableReader (CASSANDRA-13120).
  • Materialized Views for upgrades from DSE 5.1.1 or 5.1.2 or any version DSE 5.0.10 or later:
    • Cassandra will no longer allow dropping columns on tables with Materialized Views.
    • A change was made in the way the Materialized View timestamp is computed, which may cause an old deletion to a base column which is view primary key (PK) column to not be reflected in the view when repairing the base table post-upgrade. This condition is only possible when a column deletion to an MV primary key (PK) column not present in the base table PK (via UPDATE base SET view_pk_col = null or DELETE view_pk_col FROM base) is missed before the upgrade and received by repair after the upgrade. If such column deletions are done on a view PK column which is not a base PK, it's advisable to run repair on the base table of all nodes prior to the upgrade. Alternatively it's possible to fix potential inconsistencies by running repair on the views after upgrade or drop and re-create the views. See CASSANDRA-11500 for more details.
    • Removal of columns not selected in the Materialized View (via UPDATE base SET unselected_column = null or DELETE unselected_column FROM base) may not be properly reflected in the view in some situations so we advise against doing deletions on base columns not selected in views until this is fixed on CASSANDRA-13826.

Cassandra 3.10 changes

  • Runtime modification of concurrent_compactors is now available via nodetool concurrent_compactors.
  • Support for the assignment operators +=/-= has been added for update queries.
  • An Index implementation may now provide a task which runs prior to joining the ring. See CASSANDRA-12039
  • Filtering on partition key columns is now also supported for queries without secondary indexes.
  • A slow query log has been added: slow queries will be logged at DEBUG level. For more details refer to CASSANDRA-12403 and slow_query_log_timeout_in_ms in cassandra.yaml.
  • Support for GROUP BY queries has been added.
  • A new compaction-stress tool has been added to test the throughput of compaction for any cassandra-stress user schema. see compaction-stress help for how to use.
  • Prepared statements are now persisted in the table prepared_statements in the system keyspace. Upon startup, this table is used to preload all previously prepared statements - i.e. in many cases clients do not need to re-prepare statements against restarted nodes.
  • cqlsh can now connect to older Cassandra versions by downgrading the native protocol version. Please note that this is currently not part of our release testing and, as a consequence, it is not guaranteed to work in all cases. See CASSANDRA-12150 for more details.
  • Snapshots that are automatically taken before a table is dropped or truncated will have a "dropped" or "truncated" prefix on their snapshot tag name.
  • Metrics are exposed for successful and failed authentication attempts. These can be located using the object names org.apache.cassandra.metrics:type=Client,name=AuthSuccess and org.apache.cassandra.metrics:type=Client,name=AuthFailure respectively.
  • Add support to "unset" JSON fields in prepared statements by specifying DEFAULT UNSET. See CASSANDRA-11424 for details
  • Allow TTL with null value on insert and update. It will be treated as equivalent to inserting a 0.
  • Removed outboundBindAny configuration property. See CASSANDRA-12673 for details.

Cassandra 3.10 upgrade considerations

  • Support for alter types of already defined tables and of UDTs fields has been disabled. If it is necessary to return a different type, please use casting instead. See CASSANDRA-12443 for more details.
  • Specifying the default_time_to_live option when creating or altering a materialized view was erroneously accepted (and ignored). It is now properly rejected.
  • Only Java and JavaScript are now supported UDF languages. The sandbox in 3.0 already prevented the use of script languages except Java and JavaScript.
  • Compaction now correctly drops sstables out of CompactionTask when there isn't enough disk space to perform the full compaction. This should reduce pending compaction tasks on systems with little remaining disk space.
  • Request timeouts in cassandra.yaml (read_request_timeout_in_ms, etc) now apply to the "full" request time on the coordinator. Previously, they only covered the time from when the coordinator sent a message to a replica until the time that the replica responded. Additionally, the previous behavior was to reset the timeout when performing a read repair, making a second read to fix a short read, and when subranges were read as part of a range scan or secondary index query. In 3.10 and higher, the timeout is no longer reset for these "subqueries". The entire request must complete within the specified timeout. As a consequence, your timeouts may need to be adjusted to account for this. See CASSANDRA-12256 for more details.
  • Logs written to stdout are now consistent with logs written to files. Time is now local (it was UTC on the console and local in files). Date, thread, file and line info where added to stdout. (see CASSANDRA-12004)
  • The 'clientutil' jar, which has been somewhat broken on the 3.x branch, is not longer provided. The features provided by that jar are provided by any good java driver and we advise relying on drivers rather on that jar, but if you need that jar for backward compatiblity until you do so, you should use the version provided on previous Cassandra branch, like the 3.0 branch (by design, the functionality provided by that jar are stable accross versions so using the 3.0 jar for a client connecting to 3.x should work without issues).
  • (Tools development) DatabaseDescriptor no longer implicitly startups components/services like commit log replay. This may break existing 3rd party tools and clients. In order to startup a standalone tool or client application, use the DatabaseDescriptor.toolInitialization() or DatabaseDescriptor.clientInitialization() methods. Tool initialization sets up partitioner, snitch, encryption context. Client initialization just applies the configuration but does not setup anything. Instead of using Config.setClientMode() or Config.isClientMode(), which are deprecated now, use one of the appropiate new methods in DatabaseDescriptor.
  • Application layer keep-alives were added to the streaming protocol to prevent idle incoming connections from timing out and failing the stream session (CASSANDRA-11839). This effectively deprecates the streaming_socket_timeout_in_ms property in favor of streaming_keep_alive_period_in_secs. See cassandra.yaml for more details about this property.
  • Duration literals support the ISO 8601 format. By consequence, identifiers matching that format (e.g P2Y or P1MT6H) will not be supported anymore (CASSANDRA-11873).

Cassandra 3.8 changes

  • Shared pool threads are now named according to the stage they are executing tasks for. Thread names mentioned in traced queries change accordingly.
  • A new option has been added to cassandra-stress "-rate fixed={number}/s" that forces a scheduled rate of operations/sec over time. Using this, stress can accurately account for coordinated ommission from the stress process.
  • The cassandra-stress "-rate limit=" option has been renamed to "-rate throttle="
  • hdr histograms have been added to stress runs, it's output can be saved to disk using: "-log hdrfile=" option. This histogram includes response/service/wait times when used with the fixed or throttle rate options. The histogram file can be plotted on http://hdrhistogram.github.io/HdrHistogram/plotFiles.html
  • TimeWindowCompactionStrategy has been added. This has proven to be a better approach to time series compaction and new tables should use this instead of DTCS. See CASSANDRA-9666 for details.
  • DateTieredCompactionStrategy has been deprecated - new tables should use TimeWindowCompactionStrategy. Note that migrating an existing DTCS-table to TWCS might cause increased compaction load for a while after the migration so make sure you run tests before migrating. Read CASSANDRA-9666 for background on this.
  • Change-Data-Capture is now available. See cassandra.yaml and for cdc-specific flags and a brief explanation of on-disk locations for archived data in CommitLog form. This can be enabled via ALTER TABLE ... WITH cdc=true. Upon flush, CommitLogSegments containing data for CDC-enabled tables are moved to the data/cdc_raw directory until removed by the user and writes to CDC-enabled tables will be rejected with a WriteTimeoutException once cdc_total_space_in_mb is reached between unflushed CommitLogSegments and cdc_raw. NOTE: CDC is disabled by default in the .yaml file. Do not enable CDC on a mixed-version cluster as it will lead to exceptions which can interrupt traffic. Once all nodes have been upgraded to 3.8 it is safe to enable this feature and restart the cluster.

Cassandra 3.10 upgrade considerations

  • The ReversedType behaviour has been corrected for clustering columns of BYTES type containing empty value. Scrub should be run on the existing SSTables containing a descending clustering column of BYTES type to correct their ordering. See CASSANDRA-12127 for more details.
  • Ec2MultiRegionSnitch will no longer automatically set broadcast_rpc_address to the public instance IP if this property is defined on cassandra.yaml.
  • The name "json" and "distinct" are not valid anymore a user-defined function names (they are still valid as column name however). In the unlikely case where you had defined functions with such names, you will need to recreate those under a different name, change your code to use the new names and drop the old versions, and this _before_ upgrade (see CASSANDRA-10783 for more details).

Cassandra 3.7 upgrade considerations

  • A maximum size for SSTables values has been introduced, to prevent out of memory exceptions when reading corrupt SSTables. This maximum size can be set via max_value_size_in_mb in cassandra.yaml. The default is 256MB, which matches the default value of native_transport_max_frame_size_in_mb. SSTables will be considered corrupt if they contain values whose size exceeds this limit. See CASSANDRA-9530 for more details.

Cassandra 3.6 changes

  • JMX connections can now use the same auth mechanisms as CQL clients. New options in cassandra-env.(sh|ps1) enable JMX authentication and authorization to be delegated to the IAuthenticator and IAuthorizer configured in cassandra.yaml. The default settings still only expose JMX locally, and use the JVM's own security mechanisms when remote connections are permitted. For more details on how to enable the new options, see the comments in cassandra-env.sh. A new class of IResource, JMXResource, is provided for the purposes of GRANT/REVOKE via CQL. See CASSANDRA-10091 for more details. Also, directly setting JMX remote port via the com.sun.management.jmxremote.port system property at startup is deprecated. See CASSANDRA-11725 for more details.
  • JSON timestamps are now in UTC and contain the timezone information, see CASSANDRA-11137 for more details.
  • Collision checks are performed when joining the token ring, regardless of whether the node should bootstrap. Additionally, replace_address can legitimately be used without bootstrapping to help with recovery of nodes with partially failed disks. See CASSANDRA-10134 for more details.
  • Key cache will only hold indexed entries up to the size configured by column_index_cache_size_in_kb in cassandra.yaml in memory. Larger indexed entries will never go into memory. See CASSANDRA-11206 for more details.
  • For tables having a default_time_to_live specifying a TTL of 0 will remove the TTL from the inserted or updated values.
  • Startup is now aborted if corrupted transaction log files are found. The details of the affected log files are now logged, allowing the operator to decide how to resolve the situation.
  • Filtering expressions are made more pluggable and can be added programatically via a QueryHandler implementation. See CASSANDRA-11295 for more details.

Cassandra 3.4 changes

  • Internal authentication now supports caching of encrypted credentials. Reference cassandra.yaml:credentials_validity_in_ms
  • Remote configuration of auth caches via JMX can be disabled using the the system property cassandra.disable_auth_caches_remote_configuration
  • sstabledump tool is added to be 3.0 version of former sstable2json. The tool only supports v3.0+ SSTables. See tool's help for more detail.
  • The mbean interfaces org.apache.cassandra.auth.PermissionsCacheMBean and org.apache.cassandra.auth.RolesCacheMBean are deprecated in favor of org.apache.cassandra.auth.AuthCacheMBean. This generalized interface is common across all caches in the auth subsystem. The specific mbean interfaces for each individual cache will be removed in a subsequent major version.

Cassandra 3.2 changes

  • We now make sure that a token does not exist in several data directories. This means that we run one compaction strategy per data_file_directory and we use one thread per directory to flush. Use nodetool relocatesstables to make sure your tokens are in the correct place, or just wait and compaction will handle it. See CASSANDRA-6696 for more details.
  • bound maximum in-flight commit log replay mutation bytes to 64 megabytes tunable via cassandra.commitlog_max_outstanding_replay_bytes
  • Support for type casting has been added to the selection clause.
  • Hinted handoff now supports compression. Reference cassandra.yaml:hints_compression. Note: hints compression is currently disabled by default.
  • The Thrift API is deprecated and will be removed in Cassandra 4.0.

Cassandra 3.2 upgrade considerations

  • The compression ratio metrics computation has been modified to be more accurate.
  • Running Cassandra as root is prevented by default.
  • JVM options are moved from cassandra-env.(sh|ps1) to jvm.options.

Cassandra 3.1 upgrade considerations

  • The return value of SelectStatement::getLimit as been changed from DataLimits to int.
  • Custom index implementation should be aware that the method Indexer::indexes() has been removed as its contract was misleading and all custom implementation should have almost surely returned true inconditionally for that method.
  • GC logging is now enabled by default (you can disable it in the jvm.options file if you prefer).

TinkerPop changes for DSE 6.8.0

A list of DataStax Enterprise 6.8.0 production-certified changes in addition to Apache TinkerPop .

DataStax Enterprise (DSE) 6.8.0 includes all changes from previous DSE releases plus these production-certified changes that are in addition to Apache TinkerPop™ 3.4.5:

  • Added a toString() serializer for GraphBinary.
  • Configured the Gremlin Console to use GraphBinary by default.
  • Fixed transaction management for empty iterators in Gremlin Server.
  • Deprecated MessageSerializer implementations for Gryo in Gremlin Server.
  • Deprecated Serializers enum values of GRYO_V1D0 and GRYO_V3D0.
  • Deprecated SerTokens values of MIME_GRYO_V1D0 and MIME_GRYO_V3D0.
  • Added a Docker command to start Gremlin Server with the standard GLV test configurations.
  • Added aggregate(Scope,String) and deprecated store() in favor of aggregate(local).
  • Modified NumberHelper to better ignore Double.NaN in min() and max() comparisons.
  • Bump to Netty 4.1.36.
  • Added userAgent to RequestOptions.
  • Gremlin Console sends Gremlin Console/version as the userAgent.
  • Fixed DriverRemoteConnection ignoring with Token options when multiple were set.
  • Added :set warnings true|false to Gremlin Console.
  • Provided support for withComputer() in gremlin-javascript.
  • Deprecated remote traversal side-effect retrieval and related infrastructure.
  • Bump to Jackson Databind 2.9.9.1.
  • Fixed bug with Python in g:Date of GraphSON where local time zone was being used during serialization/deserialization.
  • Deprecated multi/meta-property support in Neo4jGraph.
  • Improved exception and messaging for gt/gte/lt/lte when one of the object isn’t a Comparable.
  • Added test infrastructure to check for storage iterator leak.
  • Fixed multiple iterator leaks in query processor.
  • Fixed optional() so that the child traversal is treated as local.
  • Changed default keep-alive time for driver to 3 minutes.
  • Fixed bug where server-side keep-alive was not always disabled when its setting was zero.
  • Added support for hasNext() in Javascript and .NET.
  • Improved error messaging for invalid inputs to the TinkerGraph IdManager instances.
  • Forced replacement of connections in Java driver for certain exception types that seem to ultimately kill the connection.
  • Changed the reverse() of desc and asc on Order to not use the deprecated decr and incr.
  • Fixed bug in MatchStep where the correct was not properly determined.
  • Fixed bug where client/server exception mismatch when server throw StackOverflowError.
  • Added underscore suffixed steps and tokens in Gremlin-Python that conflict with global function names.
  • Prevent exception when closing a session that doesn’t exist.
  • Allow predicates and traversals to be used as options in BranchStep.
  • Ensure only a single final response is sent to the client with Gremlin Server.
  • Deprecated ResponseHandlerContext with related infrastructure and folded its functionality into Context in Gremlin Server.
  • Improved performance of aggregate() by avoiding excessive calls to hasNext() when the barrier is empty.