DSE 6.8.0 release notes
DataStax Enterprise 6.8.x release notes are now hosted here: DSE 6.8.4 and later release notes |
30 March 2020
6.8.0 Components
-
Apache Solr™ 6.0.1.4.2718
-
Apache Spark™ 2.4.0.13
-
Apache TinkerPop™ 3.4.5 with additional production-certified changes
-
Apache Tomcat® 8.0.53
-
DSE Java Driver 1.10.0-dse+20200217
-
Netty 4.1.25.6.dse
-
Spark Jobserver 0.8.0.49 (DSE custom version)
-
DSE 6.8 is based on Apache Cassandra® 3.11 with additional production-certified changes.
DSE 6.8.0 is compatible with Apache Cassandra 3.11 and adds additional production-certified changes.
Along with Cassandra 3.11.6, DSE 6.8.0 is supported by a new product offering, DataStax Kubernetes Operator for Apache Cassandra.
Experimental features
DataStax Labs provides the Apache Cassandra and DataStax communities with non-supported previews of potential production software enhancements, tools, aids, and partner software designed to increase productivity. See DataStax Labs and DSE OpsCenter Labs features.
6.8.0 DSE database
Changes and enhancements:
-
Beta version of Storage-Attached Indexing (SAI), which provides traditional relational database indexing capabilities for DSE, without complexity or operational challenges. Includes the revised
CREATE CUSTOM INDEX
command with a newUSING 'StorageAttachedIndex'
clause. See the SAI topics and the CREATE CUSTOM INDEX command. -
Beta version of the DSE Backup and Restore Service, which enables cluster-wide backup and restore operations. Intended for users of DataStax Kubernetes Operator for Apache Cassandra® (Cassandra Operator).
-
Performance improvement: Better memtable memory usage. (DB-2164)
-
DataStax recommends enabling NodeSync on base tables with materialized views. See Materialized views maintenance guidelines. (DB-2831)
-
Simplified maintenance of materialized views by requiring periodic repair of only base tables. Updates on columns indexed in materialized views might see a performance regression of up to 20% in throughput and latency. (DB-2816)
-
New client requests rate and internode message rate usage statistics are output from nodetool tpstats. (DB-3619)
-
Fixed a rare race during streaming that could have resulted in aborted SSTable files being left on disk. (DB-2733)
-
Compaction performance improvement with new
cassandra.yaml
pick_level_on_streaming option. (DB-1658) -
Incremental NodeSync service on newly created tables is now on by default. (DB-3934)
-
Existing tables that did not have NodeSync service enabled in DSE 6.7 and earlier are not affected after upgrade.
-
Existing tables with NodeSync service enabled in DSE 6.7 and earlier retain incremental NodeSync service enabled after upgrade.
-
To disable NodeSync service for all new tables, use the -Ddse.nodesync.disable_on_new_tables system parameter.
-
To disable NodeSync for a single table, use the table property:
ALTER TABLE table_name WITH nodesync = { 'enabled' : 'false' }
-
-
Introduce new
TRUNCATE
permission. (DB-74)-
Deprecate
MODIFY
permission. -
New
UPDATE
permission allows DML statementsINSERT
,UPDATE
, andDELETE
. -
New
TRUNCATE
permissions allows execution ofTRUNCATE
statement. DataStax recommends migrating fromMODIFY
permission. Thoroughly plan and implement this migration with care in case aROLE
needs theTRUNCATE
permission.
-
-
Create new settings system view by merging CASSANDRA-14573. Allows select properties to be modified. (DB-37)
Column Name CQL Type Column Type Description name
text
partition key
The property name
value
text
regular
The property value as text
writeable
boolean
regular
true if the property can be updated, false otherwise
-
In
cassandra.yaml
,memtable_flush_writers
now defaults to 8 andmemtable_cleanup_threshold
now defaults tomax(0.2, 1 / (memtable_flush_writers + 1))
to establish a lower bound. PendingTasks metric attached to allJMXEnabledThreadPoolExecutor
instances, includingMemtableFlushWriter
, now excludes currently executing tasks. (DB-2871) -
Fix error that occurs when
initialCapacity
overflows during continuous paging. (DB-2852) -
dsetool ring shows in-progress search index building during bootstrap. (DSP-15281)
-
Improved output of
nodetool tablestats
to accurately report when no data is available. (DB-2854) -
New cqlsh command line option to set
consistency-level
toserial
. (DB-4225) -
Remove the serialization header partition/clustering key validation for secondary-index sstables and changes the validation to consider the element type. (DB-4111)
-
Keyspace and table names now allow 222 characters. (DB-2128)
-
Fix
viewbuildstatus
error that returns incorrect value. (DB-2397) -
nodetool stop
hasanticompaction
option. (DB-3821) -
New
ALL TABLES
permission allows permissions on all of the tables of a keyspace, but not the keyspace itself. (DB-3823) -
Log messages for leak detection now specify if the leaked resource has negative consequences. (DB-3827)
-
Fix potentially incorrect dropped messages in case of time drifts. (DB-3891)
-
Change in validation method for
initial_token
andnum_tokens
forcassandra.yaml
. (DB-1618) -
Fix a bug to correctly estimate the size of the on-heap memtable metadata. (DB-2086)
-
Starting from DSE 6.8.0, if nodes need to communicate via a different interface than the one configured as the
listen_address
, you must configure the additional interface as thebroadcast_address
. If no routing exists between the two interfaces, you need to eventually setlisten_on_broadcast_address: true
. (DB-4142) -
Fix bug to allow
LIST ROLES
andLIST USERS
to work withsystem-keyspace-filtering
enabled. (DB-4221) -
nodetool compactionstats
now accurately shows the pending tasks for TimeWindowCompactionStrategy (TWCS). (DB-3495) -
DROP KEYSPACE
now waits for interrupted compactions to finish before dropping the keyspace. (DB-3575) -
During the execution of CQL queries, check guardrails are ignored if the user is a superuser or for internal system queries. (DB-3654)
-
Replaced
stream_entire_sstables
for Cassandra withzerocopy_streaming_enabled
for DataStax Enterprise, along with the related options. (DB-3832) -
Improved handling of SSTable min and max clustering. (DB-3728)
-
For
cassandra.yaml
configuration, onlyTLS
is allowed for theprotocol
option for client_encryption_options and server_encryption_options for security. (DB-2786) -
New counter metrics for submitted traversals and throughput are under
com.datastax.bdp.metrics.graph
. (DSP-17009) -
Improved performance of dsetool, dse client-tool, dse fs, and nodetool commands. (DSP-17586)
-
DataStax Bulk Loader is not included with DataStax Enterprise installations, but can be installed separately. (DSP-19469)
-
The application_name and application_version attributes set by the Python driver flow through to driver event reporting and Insights. cqlsh sets these attributes to appropriate values so connected clients appear neatly in Insights and other reporting. (DSP-20119)
-
Incremental Nodesync is disabled when an ordering partitioner configured. (DB-4024)
-
Added
hostname_verification
toldap_options
indse.yaml
. (DSP-20302) -
Add the ability to add the reason for re-indexing to the dsetool core_indexing_status command. (DSP-20264)
-
Make the search reference visible in the error message for LDAP connections. (DSP-20578)
-
The
jvm.options
file is now namedjvm-server.options
. (DSP-20769) -
Security updates:
-
Allow setting of pre-hashed passwords via CQL. (DB-3293)
-
Fix min/max clustering keys being stored in plain text in SSTable statistics. (DB-3845)
-
Upgrade Apache Solr to address CVE-2018-8026. (DSP-16653)
-
Upgrade Jackson Databind to address CVE-2018-11307 and CVE-2019-14540 (DSP-18099, DB-2911, DSP-17964)
-
Apache Spark local privilege escalation vulnerability: CVE-2018-11760. (DSP-18225)
-
Upgrade spray-json to prevent Denial Of Service (DoS) vulnerability CVE-2018-18854 and CVE-2018-18853. (DSP-19208)
-
Upgrade Apache MINA Core library to 2.0.21 to prevent a security issue where Apache MINA Core was vulnerable to information disclosure. (DSP-19213)
-
Upgrade Jackson Databind to address CVE-2019-16942. (DSP-19896)
-
Remove Jodd Core dependency that created vulnerability to Arbitrary File Writes. (DSP-19206)
-
Known issues:
-
Streaming throughput (bootstrapping, decommissioning nodes) may be slower on networks with high latency. (DB-4041)
Workaround: Expand streaming buffers setting the following in
cassandra.yaml
: -
Certain workloads may cause higher context switches and increased latencies. (DSP-20499)
Workaround: Set system properties during startup:
-Ddse.tpc.work_stealing_max_unparks=1
-
NoSuchMethod error returned when creating SASI index. (DSP-20720)
6.8.0 DSE Analytics
Changes and enhancements:
-
Improved reliability of no-space-left-on-device detection. The DSEFS min_free_space default value in
dse.yaml
is reduced from 5 GB to 256 MB. (DSP-16873) -
Apache Spark™ 2.4 runs with Scala 2.11.12 by default. Upgrade the compile time dependencies for structured streaming and other experimental Spark features. Even though most Spark jobs from earlier Spark 2.x builds can run on Spark 2.4 without recompiling, DataStax recommends that you recompile your applications against Spark 2.4 to guarantee compatibility. (DSP-17823)
-
Add the ability to set time-to-live (TTL) and WriteTime in DseGraphFrames and Spark DataFrames. (DSP-17044)
-
Bring-Your-Own-Spark (BYOS) builds include dependencies for Joda and Commons-Configuration. (DSP-20512)
-
During Spark Application startup,
Exception: java.lang.ExceptionInInitializerError thrown from the UncaughtExceptionHandler in thread "main"
was logged, sometimes instead of a meaningful error. (DSP-20474) -
New
spark.cassandra.query.consistency.level
parameter sets the default consistency level for sessions accessed by Spark Connector. The default consistency level for HiveMetaStore is LOCAL_QUORUM. (DSP-19982) -
Changes to IN clauses. (DSP-15203)
-
Multiple IN clauses on partition and clustering keys can be pushed down to Cassandra.
-
If cross product of values in IN clauses exceeds
spark.sql.dse.inClauseToJoinConversionThreshold
, thenJoinWithCassandraTable
is performed instead. -
If cross product of values in IN clauses exceeds
spark.sql.dse.inClauseToFullScanConversionThreshold
, then full table scan is performed instead.
-
6.8.0 DSEFS
Changes and enhancements:
-
Running Spark applications with large number of partitions creates many tombstones and may cause tombstone warnings or in extreme cases a job failure. DSE 6.8 reduces the number of tombstones created during Spark job commit and improves performance of some Spark jobs up to 60%. (DSP-15762)
-
When creating a file through WebHDFS API, DSEFS does not verify WX permissions of parent’s parent when and the parent exists. (DSP-20355)
-
Allow DSEFS to use mixed case keyspaces to connect directly to the dsefs keyspace. (DSP-20354)
-
DSEFS node identifiers are now the same as DSE node identifiers. The NODE_ID file in the
dsefs
working directory is no longer needed. (DSP-18009) -
Add support for multiple contact points for DSEFS implementation of the Hadoop FileSystem. Provides FileSystem URI with
dsefs://host0[:port][,host1[:port]]/
. ( DSP-19704) -
DSEFS local file system implementation now returns alphabetically sorted directories and files when using wildcards and listing command. (DSP-20057)
-
DSEFS now stores information about data usage in the local storage directory instead of Cassandra. This change improves reliability if some nodes are down. (DSP-15349)
-
Improve reliability of DSEFS internode connections. Fix error for missing session key when cluster nodes were down. (DSP-15347)
-
Improved DSEFS node health reported by
dsefs df
for consistency withdsetool status
andnodetool status
. (DSP-15346)
6.8.0 DSE Graph
Changes and enhancements:
-
Traversal length is hard-coded to 90 steps and traversals with more steps fail with an error message to split it into multiple smaller traversals. (DSP-17657)
-
Enhanced Graph OLAP Spark configuration. (DSP-17832)
-
New Spark configuration properties in
resources/graph/conf/olap.properties
. -
New
dse client-tool graph-olap
commands.
-
-
DataStax Graph (core) changes:
-
Changed delimiters and a checksum make IDs more readable and removes the need to manually construct IDs. (DSP-15963)
-
Support for user-defined types (UDTs) is added. (DSP-16030)
-
System is now available while aliased. Alias for a missing graph now allows a user to issue commands. Exceptions occur when accessing
'g'
(or the alias),'graph'
, or'schema'
. (DSP-16682) -
.withReplication is mandatory. Classic is no longer a valid engine to create a graph using the new syntax. (DSP-16698)
-
Timeouts from
dse.yaml
are still valid. However, there is no Gremlin-exposed traversal source or graph configuration. To change timeouts, set them on the driver connection. Timeouts cannot exceed limits set indse.yaml
. (DSP-16758) -
Improved user experience when authorization errors occur. (DSP-18125)
-
Fixed Graph cleanUp failures due to duplicate properties on dangling edges. (DSP-20460)
-
-
Track latencies for unaliased, global, and per-graph traversals. (DSP-16455)
-
Upgraded dependency for Graph prototyping to TinkerPop 3.4.0. (DSP-16452)
-
Introduces
with()
step modulator which will enable DSE Graph to modify step behaviors as ing.addV('person').with(ttl, 1000)
-
Removes deprecated rebindings option; older drivers going back to 3.1.x can no longer connect.
-
min()
andmax()
work on any Comparable to allow forg.V().values('name').min()
. -
Reduces barrier steps (
min()
,max()
,mean()
,sum()
) so no result is returned if there is no input rather thanNaN
or0
, which could lead to unintuitive results. -
Changes the order of
select()
scopes to make it easier to select a specific map entry if a side-effect existed with the same name.
-
-
The AndStep, OrStep, DedupGlobalStep, RangeGlobalStep, NotStep, SelectStep, SelectOneStep, OrderGlobalStep, and WherePredicateStep steps are valid when determining whether to route OLAP traversals to DseGraphFrames. (DSP-16233)
-
Allow range queries using part of the Custom Vertex ID. This change requires the partition key components to be specified before the range query may be specified on the clustering key components. (DSP-12501)
-
Add read/write support for TTL and WriteTime into DataGraphFrames (DGF). (DSP-19304)
-
Expose configuration and metrics for Gremlin query cache. (DSP-20240)
-
Change classic Graph query so vertices are read from
_p
tables in Cassandra usingSELECT ... WHERE <vertex primary key columns>
statement. The search predicate is applied in memory. (DSP-20230) -
Update TinkerPop version bump, which changes the following settings if SSL configuration options are used for Gremlin Console and TinkerPop drivers:
-
Added:
-
keyStore
-
keyStorePassword
-
trustStore
-
trustStorePassword
-
keyStoreType
-
sslEnabledProtocols
-
sslCipherSuites
-
sslSkipCertValidation
-
-
Deprecated:
-
trustCertChainFile
-
keyCertChainFile
-
keyFile
-
keyPassword (DSP-17552)
-
-
Known issues:
-
Server could get slow or unresponsive if lots of long-running traversals are canceled. (DSP-20425)
Workaround: Run with shorter traversal timeouts, if traversal allows.
6.8.0 DSE Search
Changes and enhancements:
-
Unbounded facet searches are no longer allowed. (DSP-18693)
-
facet.limit
< 0 is no longer supported. Override the defaultfacet.limit
of 20000 with the-Dsolr.max.facet.limit.size
system property. -
This change adds guardrails that can cause misconfigured faceting queries to fail. Before upgrading, set an explicit
facet.limit
.
-
-
The dsetool stop_core_reindex command now mentions the node in the output message. (DSP-17090)
-
Legacy Solr join queries are no longer valid. The
to
,from
, andforce
parameters are invalid. Joins can no longer be performed on non-partition key columns or on different keyspaces. (DSP-17431) -
The dsetool core_indexing_status command now mentions the indexing reason in the output message.
-
The recommendation to enable live indexing on only one search core per cluster was too conservative. See Tuning search for maximum indexing throughput and Capacity planning for DSE Search. Be sure to follow the DataStax recommendations for your environment. (DSP-17939)
-
DSE Management API is available for enhancing operation with Kubernetes. (DSP-18785)
-
Improved real-time search to fix a docValues bug. (DSP-20300)
-
Passing TextField Solr fields with docValues to facet.field, facet.pivot, group.field, and sort (including native CQL Solr queries that use a TextField with docValues with ORDER BY) is now illegal and will fail the query in question. (DSP-18238)
-
Improved guidance with warnings when index rebuild is required for ALTER SEARCH INDEX, RELOAD SEARCH INDEX, and dsetool reload_core commands. (DSP-19347)
-
Replicas with non-queryable indexes will be skipped by the coordinator node to improve availability for index read. During
nodetool rebuild_index
, Storage-Attached Indexing (SAI) will be marked as non-queryable until the index build finishes, while 2i will remain queryable. (DSP-19543) For related information, see What is SAI?. -
Error messages related to Solr errors contain better descriptions of the root cause. (DSP-13792)
Known issues:
-
Mixed workloads with very wide partitions could see diminished performance. (DSP-20386)
Workaround: Set system property in startup:
-Dnetty.eventloop.tasks_processing_time_limit_ms=100
Cassandra enhancements for DSE 6.8.0
DataStax Enterprise 6.8.0 is compatible with Apache Cassandra® 3.11 and adds these production-certified enhancements:
-
Add ability to encrypt sstables (CASSANDRA-9633)
-
Catch non-IOException in FileUtils.close to make sure that all resources are closed (CASSANDRA-15225)
-
Nodetool import row cache invalidation races with adding SSTables to tracker (CASSANDRA-14529)
-
Let nodetool import take a list of directories (CASSANDRA-14442)
-
Nodetool import cleanup and improvements (CASSANDRA-14417)
-
Add ability to load new SSTables from a separate directory (CASSANDRA-6719)
-
Add a few options to nodetool verify (CASSANDRA-14201)
-
Make all DDL statements idempotent and not dependent on global state (CASSANDRA-13426)
-
BloomFilter serialization format should not change byte ordering (CASSANDRA-9067)
-
Remove unused on-heap BloomFilter implementation (CASSANDRA-14152)
-
Add a virtual table to expose settings (CASSANDRA-14573)
-
Add a virtual table to expose caches (CASSANDRA-14538)
-
Expose buffer cache metrics in caches virtual table. (CASSANDRA-14626)
-
Fix up chunk cache handling of metrics (CASSANDRA-14628)
-
Add a virtual table to expose active client connections (CASSANDRA-14458)
-
Clean up and refactor client metrics (CASSANDRA-14524)
-
NodeTool clientstats should show SSL Cipher (CASSANDRA-14322)
-
Add ability to specify driver name and version (CASSANDRA-14275)
-
Add nodetool clientlist (CASSANDRA-13665)
-
Bind to correct local address in 4.0 streaming (CASSANDRA-14362)
-
Set broadcast address in internode messaging handshake (CASSANDRA-14579)
-
Internode messaging handshake sends wrong messaging version number (CASSANDRA-14540)
-
Use Netty for streaming (CASSANDRA-12229)
-
Use Netty for internode messaging (CASSANDRA-8457)
-
Correct and clarify SSLFactory.getSslContext method and call sites (CASSANDRA-14314)
-
Remove
read_repair_chance
/dc_local_read_repair_chance
(CASSANDRA-13910) -
Properly close StreamCompressionInputStream to release any ByteBuf (CASSANDRA-13906)
-
Correctly close Netty channels when a stream session ends (CASSANDRA-13905)
-
Fix buffer length comparison when decompressing in Netty-based streaming (CASSANDRA-13899)
-
Race condition when closing stream sessions (CASSANDRA-13852)
-
dtest failure: snapshot_test.py:TestSnapshot.test_basic_snapshot_and_restore (CASSANDRA-13836)
-
Make monotonic read / read repair configurable (CASSANDRA-14635)
-
Improve read repair blocking behavior (CASSANDRA-10726)
-
Add coordinator write metric per CF (CASSANDRA-14232)
-
Make PartitionUpdate and Mutation immutable (CASSANDRA-13867)
-
Disable old native protocol versions on demand (CASSANDRA-14659)
-
Refactor CompactionStrategyManager (CASSANDRA-14621)
-
Extend IAuthenticator to accept peer SSL certificates (CASSANDRA-14652)
-
For LCS, single SSTable up-level is handled inefficiently (CASSANDRA-12526)
-
Fix setting min/max compaction threshold with LCS (CASSANDRA-14388)
General upgrade advice for DSE 6.8.0
DataStax Enterprise 6.8.0 is compatible with Apache Cassandra® 3.11. All upgrade advice from previous versions applies. Carefully reviewing the DataStax Enterprise upgrade planning and upgrade instructions can ensure a smooth upgrade and avoid pitfalls and frustrations.
DataStax Enterprise 6.8.0 is compatible with Apache Cassandra 3.11 and adds Cassandra enhancements for DSE 6.8.0.
Additional advice for upgrading between versions of Apache Cassandra includes:
Cassandra 4.0 changes
-
Catch non-IOException in FileUtils.close to make sure that all resources are closed (CASSANDRA-15225)
-
Nodetool import row cache invalidation races with adding SSTables to tracker (CASSANDRA-14529)
-
Let nodetool import take a list of directories (CASSANDRA-14442)
-
nodetool import cleanup and improvements (CASSANDRA-14417)
-
Add ability to load new SSTables from a separate directory (CASSANDRA-6719)
-
Add a few options to nodetool verify (CASSANDRA-14201)
-
Make all DDL statements idempotent and not dependent on global state (CASSANDRA-13426)
-
Always close RT markers returned by ReadCommand#executeLocally() (CASSANDRA-14515)
-
BloomFilter serialization format should not change byte ordering (CASSANDRA-9067)
-
Remove unused on-heap BloomFilter implementation (CASSANDRA-14152)
-
Add a virtual table to expose settings (CASSANDRA-14573)
-
Add a virtual table to expose caches (CASSANDRA-14538)
-
Fix up chunk cache handling of metrics (CASSANDRA-14628)
-
Add a virtual table to expose active client connections (CASSANDRA-14458)
-
Clean up and refactor client metrics (CASSANDRA-14524)
-
NodeTool clientstats should show SSL Cipher (CASSANDRA-14322)
-
Add ability to specify driver name and version (CASSANDRA-14275)
-
Add nodetool clientlist (CASSANDRA-13665)
-
Pad uncompressed chunks when they would be interpreted as compressed (CASSANDRA-14892)
-
Bind to correct local address in 4.0 streaming (CASSANDRA-14362)
-
Set broadcast address in internode messaging handshake (CASSANDRA-14579)
-
Internode messaging handshake sends wrong messaging version number (CASSANDRA-14540)
-
Use Netty for streaming (CASSANDRA-12229)
-
Use Netty for internode messaging (CASSANDRA-8457)
-
Correct and clarify SSLFactory.getSslContext method and call sites (CASSANDRA-14314)
-
Properly close StreamCompressionInputStream to release any ByteBuf (CASSANDRA-13906)
-
Correctly close Netty channels when a stream session ends (CASSANDRA-13905)
-
Fix buffer length comparison when decompressing in Netty-based streaming (CASSANDRA-13899)
-
Race condition when closing stream sessions (CASSANDRA-13852)
-
Make monotonic read / read repair configurable (CASSANDRA-14635)
-
Improve read repair blocking behavior (CASSANDRA-10726)
-
Add coordinator write metric per CF (CASSANDRA-14232)
-
Make PartitionUpdate and Mutation immutable (CASSANDRA-13867)
-
Disable old native protocol versions on demand (CASSANDRA-14659)
-
Refactor CompactionStrategyManager (CASSANDRA-14621)
-
Extend IAuthenticator to accept peer SSL certificates (CASSANDRA-14652)
-
For LCS, single SSTable up-level is handled inefficiently (CASSANDRA-12526)
-
Fix setting min/max compaction threshold with LCS (CASSANDRA-14388)
-
Support light-weight transactions in cassandra-stress (CASSANDRA-13529)
-
Add a virtual table to expose all running sstable tasks (CASSANDRA-14457)
-
Implement virtual keyspace interface (CASSANDRA-7622)
-
cassandra-stress throws NPE if insert section isn’t specified in user profile (CASSSANDRA-14426)
-
nodetool listsnapshots is missing local system keyspace snapshots (CASSANDRA-14381)
-
CVE-2017-5929 Security vulnerability and redefine default log rotation policy (CASSANDRA-14183)
-
Fix sstablemetadata date string for minLocalDeletionTime (CASSANDRA-14132)
-
Make sub-range selection for non-frozen collections return null instead of empty (CASSANDRA-14182)
-
Fix cassandra-stress startup failure (CASSANDRA-14106)
-
Fix trivial log format error (CASSANDRA-14015)
-
Allow sstabledump to do a json object per partition (CASSANDRA-13848)
-
Remove unused and deprecated methods from AbstractCompactionStrategy (CASSANDRA-14081)
-
Fix Distribution.average in cassandra-stress (CASSANDRA-14090)
-
Presize collections (CASSANDRA-13760)
-
Add GroupCommitLogService (CASSANDRA-13530)
-
Parallelize initial materialized view build (CASSANDRA-12245)
-
Fix flaky SecondaryIndexManagerTest.assert[Not]MarkedAsBuilt (CASSANDRA-13965)
-
Make LWTs send resultset metadata on every request (CASSANDRA-13992)
-
Fix flaky indexWithFailedInitializationIsNotQueryableAfterPartialRebuild (CASSANDRA-13963)
-
Introduce leaf-only iterator (CASSANDRA-9988)
-
Allow only one concurrent call to StatusLogger (CASSANDRA-12182)
-
Refactoring to specialised functional interfaces (CASSANDRA-13982)
-
Speculative retry should allow more friendly params (CASSANDRA-13876)
-
Throw exception if we send/receive repair messages to incompatible nodes (CASSANDRA-13944)
-
Replace usages of MessageDigest with Guava’s Hasher (CASSANDRA-13291)
-
Add nodetool cmd to print hinted handoff window (CASSANDRA-13728)
-
Fix some alerts raised by static analysis (CASSANDRA-13799)
-
Checksum sstable metadata (CASSANDRA-13321, CASSANDRA-13593)
-
Add result set metadata to prepared statement MD5 hash calculation (CASSANDRA-10786)
-
Add incremental repair support for --hosts, --force, and subrange repair (CASSANDRA-13818)
-
Refactor GcCompactionTest to avoid boxing (CASSANDRA-13941)
-
Expose recent histograms in JmxHistograms (CASSANDRA-13642)
-
Add SERIAL and LOCAL_SERIAL support for cassandra-stress (CASSANDRA-13925)
-
LCS needlessly checks for L0 STCS candidates multiple times (CASSANDRA-12961)
-
Update lz4 to 1.4.0 (CASSANDRA-13741)
-
Throttle base partitions during MV repair streaming to prevent OOM (CASSANDRA-13299)
-
Improve short read protection performance (CASSANDRA-13794)
-
Fix AssertionError in short read protection (CASSANDRA-13747)
-
Use compaction threshold for STCS in L0 (CASSANDRA-13861)
-
Fix problem with min_compress_ratio: 1 and disallow ratio < 1 (CASSANDRA-13703)
-
Add extra information to SASI timeout exception (CASSANDRA-13677)
-
Rework CompactionStrategyManager.getScanners synchronization (CASSANDRA-13786)
-
Add additional unit tests for batch behavior, TTLs, Timestamps (CASSANDRA-13846)
-
Add keyspace and table name in schema validation exception (CASSANDRA-13845)
-
Emit metrics whenever we hit tombstone failures and warn thresholds (CASSANDRA-13771)
-
Allow changing log levels via nodetool for related classes (CASSANDRA-12696)
-
Add stress profile yaml with LWT (CASSANDRA-7960)
-
Reduce memory copies and object creations when acting on ByteBufs (CASSANDRA-13789)
-
simplify mx4j configuration (Cassandra-13578)
-
Fix trigger example on 4.0 (CASSANDRA-13796)
-
force minimum timeout value (CASSANDRA-9375)
-
Add bytes repaired/unrepaired to nodetool tablestats (CASSANDRA-13774)
-
Don’t delete incremental repair sessions if they still have sstables (CASSANDRA-13758)
-
Fix pending repair manager index out of bounds check (CASSANDRA-13769)
-
Don’t use RangeFetchMapCalculator when RF=1 (CASSANDRA-13576)
-
Don’t optimise trivial ranges in RangeFetchMapCalculator (CASSANDRA-13664)
-
Use an ExecutorService for repair commands instead of new Thread(..).start() (CASSANDRA-13594)
-
Fix race / ref leak in anticompaction (CASSANDRA-13688)
-
Fix race / ref leak in PendingRepairManager (CASSANDRA-13751)
-
Enable ppc64le runtime as unsupported architecture (CASSANDRA-13615)
-
Improve sstablemetadata output (CASSANDRA-11483)
-
Support for migrating legacy users to roles has been dropped (CASSANDRA-13371)
-
Introduce error metrics for repair (CASSANDRA-13387)
-
Refactoring to primitive functional interfaces in AuthCache (CASSANDRA-13732)
-
Update metrics to 3.1.5 (CASSANDRA-13648)
-
batch_size_warn_threshold_in_kb can now be set at runtime (CASSANDRA-13699)
-
Avoid always rebuilding secondary indexes at startup (CASSANDRA-13725)
-
Upgrade JMH from 1.13 to 1.19 (CASSANDRA-13727)
-
Upgrade SLF4J from 1.7.7 to 1.7.25 (CASSANDRA-12996)
-
Default for start_native_transport now true if not set in config (CASSANDRA-13656)
-
Don’t add localhost to the graph when calculating where to stream from (CASSANDRA-13583)
-
Allow skipping equality-restricted clustering columns in ORDER BY clause (CASSANDRA-10271)
-
Use common nowInSec for validation compactions (CASSANDRA-13671)
-
Improve handling of IR prepare failures (CASSANDRA-13672)
-
Send IR coordinator messages synchronously (CASSANDRA-13673)
-
Flush system.repair table before IR finalize promise (CASSANDRA-13660)
-
Fix column filter creation for wildcard queries (CASSANDRA-13650)
-
Add 'nodetool getbatchlogreplaythrottle' and 'nodetool setbatchlogreplaythrottle' (CASSANDRA-13614)
-
fix race condition in PendingRepairManager (CASSANDRA-13659)
-
Allow noop incremental repair state transitions (CASSANDRA-13658)
-
Run repair with down replicas (CASSANDRA-10446)
-
Added started & completed repair metrics (CASSANDRA-13598)
-
Added started & completed repair metrics (CASSANDRA-13598)
-
Improve secondary index (re)build failure and concurrency handling (CASSANDRA-10130)
-
Improve calculation of available disk space for compaction (CASSANDRA-13068)
-
Change the accessibility of RowCacheSerializer for third party row cache plugins (CASSANDRA-13579)
-
Allow sub-range repairs for a preview of repaired data (CASSANDRA-13570)
-
NPE in IR cleanup when columnfamily has no sstables (CASSANDRA-13585)
-
Fix Randomness of stress values (CASSANDRA-12744)
-
Allow selecting Map values and Set elements (CASSANDRA-7396)
-
Fast and garbage-free Streaming Histogram (CASSANDRA-13444)
-
Update repairTime for keyspaces on completion (CASSANDRA-13539)
-
Add configurable upper bound for validation executor threads (CASSANDRA-13521)
-
Bring back maxHintTTL propery (CASSANDRA-12982)
-
Add testing guidelines (CASSANDRA-13497)
-
Add more repair metrics (CASSANDRA-13531)
-
RangeStreamer should be smarter when picking endpoints for streaming (CASSANDRA-4650)
-
Avoid rewrapping an exception thrown for cache load functions (CASSANDRA-13367)
-
Log time elapsed for each incremental repair phase (CASSANDRA-13498)
-
Add multiple table operation support to cassandra-stress (CASSANDRA-8780)
Cassandra 3.11.2 changes
-
Cassandra is now relying on the JVM options to properly shutdown on OutOfMemoryError. By default it will rely on the OnOutOfMemoryError option as the ExitOnOutOfMemoryError and CrashOnOutOfMemoryError options are not supported by the older 1.7 and 1.8 JVMs. A warning will be logged at startup if none of those JVM options are used. See CASSANDRA-13006 for more details.
Cassandra 3.11.2 upgrade considerations
-
Creating Materialized View with filtering on non-primary-key base column (added in CASSANDRA-10368) is disabled, because the liveness of view row is depending on multiple filtered base non-key columns and base non-key column used in view primary-key. This semantic cannot be supported without storage format change, see CASSANDRA-13826. For append-only use case, you may still use this feature with a startup flag: "-Dcassandra.mv.allow_filtering_nonkey_columns_unsafe=true"
-
The NativeAccessMBean isAvailable method will only return true if the native library has been successfully linked. Previously it was returning true if JNA could be found but was not taking into account link failures.
-
Primary ranges in the system.size_estimates table are now based on the keyspace replication settings and adjacent ranges are no longer merged (CASSANDRA-9639).
-
In 2.1, the default for otc_coalescing_strategy was 'DISABLED'. In 2.2 and 3.0, it was changed to 'TIMEHORIZON', but that value was shown to be a performance regression. The default for 3.11.0 and newer has been reverted to 'DISABLED'. Users upgrading from Cassandra 2.2 or 3.0 should be aware that the default has changed.
-
The StorageHook interface has been modified to allow to retrieve read information from SSTableReader (CASSANDRA-13120).
-
Materialized Views for upgrades from DSE 5.1.1 or 5.1.2 or any version DSE 5.0.10 or later:
-
Cassandra will no longer allow dropping columns on tables with Materialized Views.
-
A change was made in the way the Materialized View timestamp is computed, which may cause an old deletion to a base column which is view primary key (PK) column to not be reflected in the view when repairing the base table post-upgrade. This condition is only possible when a column deletion to an MV primary key (PK) column not present in the base table PK (via UPDATE base SET view_pk_col = null or DELETE view_pk_col FROM base) is missed before the upgrade and received by repair after the upgrade. If such column deletions are done on a view PK column which is not a base PK, it’s advisable to run repair on the base table of all nodes prior to the upgrade. Alternatively it’s possible to fix potential inconsistencies by running repair on the views after upgrade or drop and re-create the views. See CASSANDRA-11500 for more details.
-
Removal of columns not selected in the Materialized View (via UPDATE base SET unselected_column = null or DELETE unselected_column FROM base) may not be properly reflected in the view in some situations so we advise against doing deletions on base columns not selected in views until this is fixed on CASSANDRA-13826.
-
Cassandra 3.10 changes
-
Runtime modification of concurrent_compactors is now available via nodetool concurrent_compactors.
-
Support for the assignment operators +=/-= has been added for update queries.
-
An Index implementation may now provide a task which runs prior to joining the ring. See CASSANDRA-12039
-
Filtering on partition key columns is now also supported for queries without secondary indexes.
-
A slow query log has been added: slow queries will be logged at DEBUG level. For more details refer to CASSANDRA-12403 and slow_query_log_timeout_in_ms in cassandra.yaml.
-
Support for GROUP BY queries has been added.
-
A new compaction-stress tool has been added to test the throughput of compaction for any cassandra-stress user schema. see compaction-stress help for how to use.
-
Prepared statements are now persisted in the table prepared_statements in the system keyspace. Upon startup, this table is used to preload all previously prepared statements - i.e. in many cases clients do not need to re-prepare statements against restarted nodes.
-
cqlsh can now connect to older Cassandra versions by downgrading the native protocol version. Please note that this is currently not part of our release testing and, as a consequence, it is not guaranteed to work in all cases. See CASSANDRA-12150 for more details.
-
Snapshots that are automatically taken before a table is dropped or truncated will have a "dropped" or "truncated" prefix on their snapshot tag name.
-
Metrics are exposed for successful and failed authentication attempts. These can be located using the object names org.apache.cassandra.metrics:type=Client,name=AuthSuccess and org.apache.cassandra.metrics:type=Client,name=AuthFailure respectively.
-
Add support to "unset" JSON fields in prepared statements by specifying DEFAULT UNSET. See CASSANDRA-11424 for details
-
Allow TTL with null value on insert and update. It will be treated as equivalent to inserting a 0.
-
Removed outboundBindAny configuration property. See CASSANDRA-12673 for details.
Cassandra 3.10 upgrade considerations
-
Support for alter types of already defined tables and of UDTs fields has been disabled. If it is necessary to return a different type, please use casting instead. See CASSANDRA-12443 for more details.
-
Specifying the default_time_to_live option when creating or altering a materialized view was erroneously accepted (and ignored). It is now properly rejected.
-
Only Java and JavaScript are now supported UDF languages. The sandbox in 3.0 already prevented the use of script languages except Java and JavaScript.
-
Compaction now correctly drops sstables out of CompactionTask when there isn’t enough disk space to perform the full compaction. This should reduce pending compaction tasks on systems with little remaining disk space.
-
Request timeouts in cassandra.yaml (read_request_timeout_in_ms, etc) now apply to the "full" request time on the coordinator. Previously, they only covered the time from when the coordinator sent a message to a replica until the time that the replica responded. Additionally, the previous behavior was to reset the timeout when performing a read repair, making a second read to fix a short read, and when subranges were read as part of a range scan or secondary index query. In 3.10 and higher, the timeout is no longer reset for these "subqueries". The entire request must complete within the specified timeout. As a consequence, your timeouts may need to be adjusted to account for this. See CASSANDRA-12256 for more details.
-
Logs written to stdout are now consistent with logs written to files. Time is now local (it was UTC on the console and local in files). Date, thread, file and line info where added to stdout. (see CASSANDRA-12004)
-
The 'clientutil' jar, which has been somewhat broken on the 3.x branch, is not longer provided. The features provided by that jar are provided by any good java driver and we advise relying on drivers rather on that jar, but if you need that jar for backward compatiblity until you do so, you should use the version provided on previous Cassandra branch, like the 3.0 branch (by design, the functionality provided by that jar are stable accross versions so using the 3.0 jar for a client connecting to 3.x should work without issues).
-
(Tools development) DatabaseDescriptor no longer implicitly startups components/services like commit log replay. This may break existing 3rd party tools and clients. In order to startup a standalone tool or client application, use the DatabaseDescriptor.toolInitialization() or DatabaseDescriptor.clientInitialization() methods. Tool initialization sets up partitioner, snitch, encryption context. Client initialization just applies the configuration but does not setup anything. Instead of using Config.setClientMode() or Config.isClientMode(), which are deprecated now, use one of the appropiate new methods in DatabaseDescriptor.
-
Application layer keep-alives were added to the streaming protocol to prevent idle incoming connections from timing out and failing the stream session (CASSANDRA-11839). This effectively deprecates the streaming_socket_timeout_in_ms property in favor of streaming_keep_alive_period_in_secs. See cassandra.yaml for more details about this property.
-
Duration literals support the ISO 8601 format. By consequence, identifiers matching that format (e.g P2Y or P1MT6H) will not be supported anymore (CASSANDRA-11873).
Cassandra 3.8 changes
-
Shared pool threads are now named according to the stage they are executing tasks for. Thread names mentioned in traced queries change accordingly.
-
A new option has been added to cassandra-stress "-rate fixed={number}/s" that forces a scheduled rate of operations/sec over time. Using this, stress can accurately account for coordinated ommission from the stress process.
-
The cassandra-stress "-rate limit=" option has been renamed to "-rate throttle="
-
hdr histograms have been added to stress runs, it’s output can be saved to disk using: "-log hdrfile=" option. This histogram includes response/service/wait times when used with the fixed or throttle rate options. The histogram file can be plotted on http://hdrhistogram.github.io/HdrHistogram/plotFiles.html
-
TimeWindowCompactionStrategy has been added. This has proven to be a better approach to time series compaction and new tables should use this instead of DTCS. See CASSANDRA-9666 for details.
-
DateTieredCompactionStrategy has been deprecated - new tables should use TimeWindowCompactionStrategy. Note that migrating an existing DTCS-table to TWCS might cause increased compaction load for a while after the migration so make sure you run tests before migrating. Read CASSANDRA-9666 for background on this.
-
Change-Data-Capture is now available. See cassandra.yaml and for cdc-specific flags and a brief explanation of on-disk locations for archived data in CommitLog form. This can be enabled via ALTER TABLE … WITH cdc=true. Upon flush, CommitLogSegments containing data for CDC-enabled tables are moved to the data/cdc_raw directory until removed by the user and writes to CDC-enabled tables will be rejected with a WriteTimeoutException once cdc_total_space_in_mb is reached between unflushed CommitLogSegments and cdc_raw.
CDC is disabled by default in the .yaml file. Do not enable CDC on a mixed-version cluster as it will lead to exceptions which can interrupt traffic. Once all nodes have been upgraded to 3.8 it is safe to enable this feature and restart the cluster.
Cassandra 3.10 upgrade considerations
-
The ReversedType behaviour has been corrected for clustering columns of BYTES type containing empty value. Scrub should be run on the existing SSTables containing a descending clustering column of BYTES type to correct their ordering. See CASSANDRA-12127 for more details.
-
Ec2MultiRegionSnitch will no longer automatically set broadcast_rpc_address to the public instance IP if this property is defined on cassandra.yaml.
-
The name "json" and "distinct" are not valid anymore a user-defined function names (they are still valid as column name however). In the unlikely case where you had defined functions with such names, you will need to recreate those under a different name, change your code to use the new names and drop the old versions, and this before upgrade (see CASSANDRA-10783 for more details).
Cassandra 3.7 upgrade considerations
-
A maximum size for SSTables values has been introduced, to prevent out of memory exceptions when reading corrupt SSTables. This maximum size can be set via max_value_size_in_mb in cassandra.yaml. The default is 256MB, which matches the default value of native_transport_max_frame_size_in_mb. SSTables will be considered corrupt if they contain values whose size exceeds this limit. See CASSANDRA-9530 for more details.
Cassandra 3.6 changes
-
JMX connections can now use the same auth mechanisms as CQL clients. New options in cassandra-env.(sh|ps1) enable JMX authentication and authorization to be delegated to the IAuthenticator and IAuthorizer configured in cassandra.yaml. The default settings still only expose JMX locally, and use the JVM’s own security mechanisms when remote connections are permitted. For more details on how to enable the new options, see the comments in
cassandra-env.sh
. A new class of IResource, JMXResource, is provided for the purposes of GRANT/REVOKE via CQL. See CASSANDRA-10091 for more details. Also, directly setting JMX remote port via the com.sun.management.jmxremote.port system property at startup is deprecated. See CASSANDRA-11725 for more details. -
JSON timestamps are now in UTC and contain the timezone information, see CASSANDRA-11137 for more details.
-
Collision checks are performed when joining the token ring, regardless of whether the node should bootstrap. Additionally, replace_address can legitimately be used without bootstrapping to help with recovery of nodes with partially failed disks. See CASSANDRA-10134 for more details.
-
Key cache will only hold indexed entries up to the size configured by column_index_cache_size_in_kb in cassandra.yaml in memory. Larger indexed entries will never go into memory. See CASSANDRA-11206 for more details.
-
For tables having a default_time_to_live specifying a TTL of 0 will remove the TTL from the inserted or updated values.
-
Startup is now aborted if corrupted transaction log files are found. The details of the affected log files are now logged, allowing the operator to decide how to resolve the situation.
-
Filtering expressions are made more pluggable and can be added programatically via a QueryHandler implementation. See CASSANDRA-11295 for more details.
Cassandra 3.4 changes
-
Internal authentication now supports caching of encrypted credentials. Reference cassandra.yaml:credentials_validity_in_ms
-
Remote configuration of auth caches via JMX can be disabled using the the system property cassandra.disable_auth_caches_remote_configuration
-
sstabledump tool is added to be 3.0 version of former sstable2json. The tool only supports v3.0+ SSTables. See tool’s help for more detail.
-
The mbean interfaces org.apache.cassandra.auth.PermissionsCacheMBean and org.apache.cassandra.auth.RolesCacheMBean are deprecated in favor of org.apache.cassandra.auth.AuthCacheMBean. This generalized interface is common across all caches in the auth subsystem. The specific mbean interfaces for each individual cache will be removed in a subsequent major version.
Cassandra 3.2 changes
-
We now make sure that a token does not exist in several data directories. This means that we run one compaction strategy per data_file_directory and we use one thread per directory to flush. Use nodetool relocatesstables to make sure your tokens are in the correct place, or just wait and compaction will handle it. See CASSANDRA-6696 for more details.
-
bound maximum in-flight commit log replay mutation bytes to 64 megabytes tunable via cassandra.commitlog_max_outstanding_replay_bytes
-
Support for type casting has been added to the selection clause.
-
Hinted handoff now supports compression. Reference cassandra.yaml:hints_compression.
Hints compression is currently disabled by default.
-
The Thrift API is deprecated and will be removed in Cassandra 4.0.
Cassandra 3.2 upgrade considerations
-
The compression ratio metrics computation has been modified to be more accurate.
-
Running Cassandra as root is prevented by default.
-
JVM options are moved from cassandra-env.(sh|ps1) to jvm.options.
Cassandra 3.1 upgrade considerations
-
The return value of SelectStatement::getLimit as been changed from DataLimits to int.
-
Custom index implementation should be aware that the method Indexer::indexes() has been removed as its contract was misleading and all custom implementation should have almost surely returned true inconditionally for that method.
-
GC logging is now enabled by default (you can disable it in the jvm.options file if you prefer).
TinkerPop changes for DSE 6.8.0
DataStax Enterprise (DSE) 6.8.0 includes all changes from previous DSE releases plus these production-certified changes that are in addition to Apache TinkerPop™ 3.4.5:
-
Added a toString() serializer for GraphBinary.
-
Configured the Gremlin Console to use GraphBinary by default.
-
Fixed transaction management for empty iterators in Gremlin Server.
-
Deprecated MessageSerializer implementations for Gryo in Gremlin Server.
-
Deprecated Serializers enum values of GRYO_V1D0 and GRYO_V3D0.
-
Deprecated SerTokens values of MIME_GRYO_V1D0 and MIME_GRYO_V3D0.
-
Added a Docker command to start Gremlin Server with the standard GLV test configurations.
-
Added aggregate(Scope,String) and deprecated store() in favor of aggregate(local).
-
Modified NumberHelper to better ignore Double.NaN in min() and max() comparisons.
-
Bump to Netty 4.1.36.
-
Added userAgent to RequestOptions.
-
Gremlin Console sends Gremlin Console/version as the userAgent.
-
Fixed DriverRemoteConnection ignoring with Token options when multiple were set.
-
Added
:set warnings true|false
to Gremlin Console. -
Provided support for withComputer() in gremlin-javascript.
-
Deprecated remote traversal side-effect retrieval and related infrastructure.
-
Bump to Jackson Databind 2.9.9.1.
-
Fixed bug with Python in g:Date of GraphSON where local time zone was being used during serialization/deserialization.
-
Deprecated multi/meta-property support in Neo4jGraph.
-
Improved exception and messaging for
gt/gte/lt/lte
when one of the object isn’t a Comparable. -
Added test infrastructure to check for storage iterator leak.
-
Fixed multiple iterator leaks in query processor.
-
Fixed optional() so that the child traversal is treated as local.
-
Changed default keep-alive time for driver to 3 minutes.
-
Fixed bug where server-side keep-alive was not always disabled when its setting was zero.
-
Added support for hasNext() in Javascript and .NET.
-
Improved error messaging for invalid inputs to the TinkerGraph IdManager instances.
-
Forced replacement of connections in Java driver for certain exception types that seem to ultimately kill the connection.
-
Changed the reverse() of desc and asc on Order to not use the deprecated decr and incr.
-
Fixed bug in MatchStep where the correct was not properly determined.
-
Fixed bug where client/server exception mismatch when server throw StackOverflowError.
-
Added underscore suffixed steps and tokens in Gremlin-Python that conflict with global function names.
-
Prevent exception when closing a session that doesn’t exist.
-
Allow predicates and traversals to be used as options in BranchStep.
-
Ensure only a single final response is sent to the client with Gremlin Server.
-
Deprecated ResponseHandlerContext with related infrastructure and folded its functionality into Context in Gremlin Server.
-
Improved performance of aggregate() by avoiding excessive calls to hasNext() when the barrier is empty.