General upgrade advice for DSE 6.8.1

DataStax Enterprise 6.8.1 is compatible with Apache Cassandra® 3.11. All upgrade advice from previous versions applies. Carefully reviewing the DataStax Enterprise upgrade planning and upgrade instructions can ensure a smooth upgrade and avoid pitfalls and frustrations.

DataStax Enterprise 6.8.1 is compatible with Apache Cassandra 3.11 and adds Cassandra changes for DSE 6.8.1.

Additional advice for upgrading between versions of Apache Cassandra includes:

Cassandra 4.0 changes

Cassandra 3.11.2 changes

  • Cassandra is now relying on the JVM options to properly shutdown on OutOfMemoryError. By default it will rely on the OnOutOfMemoryError option as the ExitOnOutOfMemoryError and CrashOnOutOfMemoryError options are not supported by the older 1.7 and 1.8 JVMs. A warning will be logged at startup if none of those JVM options are used. See CASSANDRA-13006 for more details.

Cassandra 3.11.2 upgrade considerations

  • Creating Materialized View with filtering on non-primary-key base column (added in CASSANDRA-10368) is disabled, because the liveness of view row is depending on multiple filtered base non-key columns and base non-key column used in view primary-key. This semantic cannot be supported without storage format change, see CASSANDRA-13826. For append-only use case, you may still use this feature with a startup flag: "-Dcassandra.mv.allow_filtering_nonkey_columns_unsafe=true"

  • The NativeAccessMBean isAvailable method will only return true if the native library has been successfully linked. Previously it was returning true if JNA could be found but was not taking into account link failures.

  • Primary ranges in the system.size_estimates table are now based on the keyspace replication settings and adjacent ranges are no longer merged (CASSANDRA-9639).

  • In 2.1, the default for otc_coalescing_strategy was 'DISABLED'. In 2.2 and 3.0, it was changed to 'TIMEHORIZON', but that value was shown to be a performance regression. The default for 3.11.0 and newer has been reverted to 'DISABLED'. Users upgrading from Cassandra 2.2 or 3.0 should be aware that the default has changed.

  • The StorageHook interface has been modified to allow to retrieve read information from SSTableReader (CASSANDRA-13120).

  • Materialized Views for upgrades from DSE 5.1.1 or 5.1.2 or any version DSE 5.0.10 or later:

    • Cassandra will no longer allow dropping columns on tables with Materialized Views.

    • A change was made in the way the Materialized View timestamp is computed, which may cause an old deletion to a base column which is view primary key (PK) column to not be reflected in the view when repairing the base table post-upgrade. This condition is only possible when a column deletion to an MV primary key (PK) column not present in the base table PK (via UPDATE base SET view_pk_col = null or DELETE view_pk_col FROM base) is missed before the upgrade and received by repair after the upgrade. If such column deletions are done on a view PK column which is not a base PK, it’s advisable to run repair on the base table of all nodes prior to the upgrade. Alternatively it’s possible to fix potential inconsistencies by running repair on the views after upgrade or drop and re-create the views. See CASSANDRA-11500 for more details.

    • Removal of columns not selected in the Materialized View (via UPDATE base SET unselected_column = null or DELETE unselected_column FROM base) may not be properly reflected in the view in some situations so we advise against doing deletions on base columns not selected in views until this is fixed on CASSANDRA-13826.

Cassandra 3.10 changes

  • Runtime modification of concurrent_compactors is now available via nodetool concurrent_compactors.

  • Support for the assignment operators +=/-= has been added for update queries.

  • An Index implementation may now provide a task which runs prior to joining the ring. See CASSANDRA-12039

  • Filtering on partition key columns is now also supported for queries without secondary indexes.

  • A slow query log has been added: slow queries will be logged at DEBUG level. For more details refer to CASSANDRA-12403 and slow_query_log_timeout_in_ms in cassandra.yaml.

  • Support for GROUP BY queries has been added.

  • A new compaction-stress tool has been added to test the throughput of compaction for any cassandra-stress user schema. see compaction-stress help for how to use.

  • Prepared statements are now persisted in the table prepared_statements in the system keyspace. Upon startup, this table is used to preload all previously prepared statements - i.e. in many cases clients do not need to re-prepare statements against restarted nodes.

  • cqlsh can now connect to older Cassandra versions by downgrading the native protocol version. Please note that this is currently not part of our release testing and, as a consequence, it is not guaranteed to work in all cases. See CASSANDRA-12150 for more details.

  • Snapshots that are automatically taken before a table is dropped or truncated will have a "dropped" or "truncated" prefix on their snapshot tag name.

  • Metrics are exposed for successful and failed authentication attempts. These can be located using the object names org.apache.cassandra.metrics:type=Client,name=AuthSuccess and org.apache.cassandra.metrics:type=Client,name=AuthFailure respectively.

  • Add support to "unset" JSON fields in prepared statements by specifying DEFAULT UNSET. See CASSANDRA-11424 for details

  • Allow TTL with null value on insert and update. It will be treated as equivalent to inserting a 0.

  • Removed outboundBindAny configuration property. See CASSANDRA-12673 for details.

Cassandra 3.10 upgrade considerations

  • Support for alter types of already defined tables and of UDTs fields has been disabled. If it is necessary to return a different type, please use casting instead. See CASSANDRA-12443 for more details.

  • Specifying the default_time_to_live option when creating or altering a materialized view was erroneously accepted (and ignored). It is now properly rejected.

  • Only Java and JavaScript are now supported UDF languages. The sandbox in 3.0 already prevented the use of script languages except Java and JavaScript.

  • Compaction now correctly drops sstables out of CompactionTask when there isn’t enough disk space to perform the full compaction. This should reduce pending compaction tasks on systems with little remaining disk space.

  • Request timeouts in cassandra.yaml (read_request_timeout_in_ms, etc) now apply to the "full" request time on the coordinator. Previously, they only covered the time from when the coordinator sent a message to a replica until the time that the replica responded. Additionally, the previous behavior was to reset the timeout when performing a read repair, making a second read to fix a short read, and when subranges were read as part of a range scan or secondary index query. In 3.10 and higher, the timeout is no longer reset for these "subqueries". The entire request must complete within the specified timeout. As a consequence, your timeouts may need to be adjusted to account for this. See CASSANDRA-12256 for more details.

  • Logs written to stdout are now consistent with logs written to files. Time is now local (it was UTC on the console and local in files). Date, thread, file and line info where added to stdout. (see CASSANDRA-12004)

  • The 'clientutil' jar, which has been somewhat broken on the 3.x branch, is not longer provided. The features provided by that jar are provided by any good java driver and we advise relying on drivers rather on that jar, but if you need that jar for backward compatiblity until you do so, you should use the version provided on previous Cassandra branch, like the 3.0 branch (by design, the functionality provided by that jar are stable accross versions so using the 3.0 jar for a client connecting to 3.x should work without issues).

  • (Tools development) DatabaseDescriptor no longer implicitly startups components/services like commit log replay. This may break existing 3rd party tools and clients. In order to startup a standalone tool or client application, use the DatabaseDescriptor.toolInitialization() or DatabaseDescriptor.clientInitialization() methods. Tool initialization sets up partitioner, snitch, encryption context. Client initialization just applies the configuration but does not setup anything. Instead of using Config.setClientMode() or Config.isClientMode(), which are deprecated now, use one of the appropiate new methods in DatabaseDescriptor.

  • Application layer keep-alives were added to the streaming protocol to prevent idle incoming connections from timing out and failing the stream session (CASSANDRA-11839). This effectively deprecates the streaming_socket_timeout_in_ms property in favor of streaming_keep_alive_period_in_secs. See cassandra.yaml for more details about this property.

  • Duration literals support the ISO 8601 format. By consequence, identifiers matching that format (e.g P2Y or P1MT6H) will not be supported anymore (CASSANDRA-11873).

Cassandra 3.8 changes

  • Shared pool threads are now named according to the stage they are executing tasks for. Thread names mentioned in traced queries change accordingly.

  • A new option has been added to cassandra-stress "-rate fixed={number}/s" that forces a scheduled rate of operations/sec over time. Using this, stress can accurately account for coordinated ommission from the stress process.

  • The cassandra-stress "-rate limit=" option has been renamed to "-rate throttle="

  • hdr histograms have been added to stress runs, it’s output can be saved to disk using: "-log hdrfile=" option. This histogram includes response/service/wait times when used with the fixed or throttle rate options. The histogram file can be plotted on http://hdrhistogram.github.io/HdrHistogram/plotFiles.html

  • TimeWindowCompactionStrategy has been added. This has proven to be a better approach to time series compaction and new tables should use this instead of DTCS. See CASSANDRA-9666 for details.

  • DateTieredCompactionStrategy has been deprecated - new tables should use TimeWindowCompactionStrategy. Note that migrating an existing DTCS-table to TWCS might cause increased compaction load for a while after the migration so make sure you run tests before migrating. Read CASSANDRA-9666 for background on this.

  • Change-Data-Capture is now available. See cassandra.yaml and for cdc-specific flags and a brief explanation of on-disk locations for archived data in CommitLog form. This can be enabled via ALTER TABLE …​ WITH cdc=true. Upon flush, CommitLogSegments containing data for CDC-enabled tables are moved to the data/cdc_raw directory until removed by the user and writes to CDC-enabled tables will be rejected with a WriteTimeoutException once cdc_total_space_in_mb is reached between unflushed CommitLogSegments and cdc_raw.

CDC is disabled by default in the .yaml file. Do not enable CDC on a mixed-version cluster as it will lead to exceptions which can interrupt traffic. Once all nodes have been upgraded to 3.8 it is safe to enable this feature and restart the cluster.

Cassandra 3.10 upgrade considerations

  • The ReversedType behaviour has been corrected for clustering columns of BYTES type containing empty value. Scrub should be run on the existing SSTables containing a descending clustering column of BYTES type to correct their ordering. See CASSANDRA-12127 for more details.

  • Ec2MultiRegionSnitch will no longer automatically set broadcast_rpc_address to the public instance IP if this property is defined on cassandra.yaml.

  • The name "json" and "distinct" are not valid anymore a user-defined function names (they are still valid as column name however). In the unlikely case where you had defined functions with such names, you will need to recreate those under a different name, change your code to use the new names and drop the old versions, and this before upgrade (see CASSANDRA-10783 for more details).

Cassandra 3.7 upgrade considerations

  • A maximum size for SSTables values has been introduced, to prevent out of memory exceptions when reading corrupt SSTables. This maximum size can be set via max_value_size_in_mb in cassandra.yaml. The default is 256MB, which matches the default value of native_transport_max_frame_size_in_mb. SSTables will be considered corrupt if they contain values whose size exceeds this limit. See CASSANDRA-9530 for more details.

Cassandra 3.6 changes

  • JMX connections can now use the same auth mechanisms as CQL clients. New options in cassandra-env.(sh|ps1) enable JMX authentication and authorization to be delegated to the IAuthenticator and IAuthorizer configured in cassandra.yaml. The default settings still only expose JMX locally, and use the JVM’s own security mechanisms when remote connections are permitted. For more details on how to enable the new options, see the comments in cassandra-env.sh. A new class of IResource, JMXResource, is provided for the purposes of GRANT/REVOKE via CQL. See CASSANDRA-10091 for more details. Also, directly setting JMX remote port via the com.sun.management.jmxremote.port system property at startup is deprecated. See CASSANDRA-11725 for more details.

  • JSON timestamps are now in UTC and contain the timezone information, see CASSANDRA-11137 for more details.

  • Collision checks are performed when joining the token ring, regardless of whether the node should bootstrap. Additionally, replace_address can legitimately be used without bootstrapping to help with recovery of nodes with partially failed disks. See CASSANDRA-10134 for more details.

  • Key cache will only hold indexed entries up to the size configured by column_index_cache_size_in_kb in cassandra.yaml in memory. Larger indexed entries will never go into memory. See CASSANDRA-11206 for more details.

  • For tables having a default_time_to_live specifying a TTL of 0 will remove the TTL from the inserted or updated values.

  • Startup is now aborted if corrupted transaction log files are found. The details of the affected log files are now logged, allowing the operator to decide how to resolve the situation.

  • Filtering expressions are made more pluggable and can be added programatically via a QueryHandler implementation. See CASSANDRA-11295 for more details.

Cassandra 3.4 changes

  • Internal authentication now supports caching of encrypted credentials. Reference cassandra.yaml:credentials_validity_in_ms

  • Remote configuration of auth caches via JMX can be disabled using the the system property cassandra.disable_auth_caches_remote_configuration

  • sstabledump tool is added to be 3.0 version of former sstable2json. The tool only supports v3.0+ SSTables. See tool’s help for more detail.

  • The mbean interfaces org.apache.cassandra.auth.PermissionsCacheMBean and org.apache.cassandra.auth.RolesCacheMBean are deprecated in favor of org.apache.cassandra.auth.AuthCacheMBean. This generalized interface is common across all caches in the auth subsystem. The specific mbean interfaces for each individual cache will be removed in a subsequent major version.

Cassandra 3.2 changes

  • We now make sure that a token does not exist in several data directories. This means that we run one compaction strategy per data_file_directory and we use one thread per directory to flush. Use nodetool relocatesstables to make sure your tokens are in the correct place, or just wait and compaction will handle it. See CASSANDRA-6696 for more details.

  • bound maximum in-flight commit log replay mutation bytes to 64 megabytes tunable via cassandra.commitlog_max_outstanding_replay_bytes

  • Support for type casting has been added to the selection clause.

  • Hinted handoff now supports compression. Reference cassandra.yaml:hints_compression.

    Hints compression is currently disabled by default.

  • The Thrift API is deprecated and will be removed in Cassandra 4.0.

Cassandra 3.2 upgrade considerations

  • The compression ratio metrics computation has been modified to be more accurate.

  • Running Cassandra as root is prevented by default.

  • JVM options are moved from cassandra-env.(sh|ps1) to jvm.options.

Cassandra 3.1 upgrade considerations

  • The return value of SelectStatement::getLimit as been changed from DataLimits to int.

  • Custom index implementation should be aware that the method Indexer::indexes() has been removed as its contract was misleading and all custom implementation should have almost surely returned true inconditionally for that method.

  • GC logging is now enabled by default (you can disable it in the jvm.options file if you prefer).

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com