Important configuration changes in Cassandra 4.x

This page provides a reference of various changes made in the Apache Cassandra® configuration files in version 4.x.

Configuration files changed in version 4.x:
  • Configuration: cassandra.yaml

  • Environment: cassandra-env.sh

  • Java Options: jvm.options, jvm-server.options, jvm8-server.options, jvm11-server.options

  • Replica Assignments: cassandra-rackdc.properties

Where to find Cassandra configuration files

The location of Cassandra’s configuration files varies, depending on the type of installation:

Installation type Location

Package installations (APT, YUM, etc.)

/etc/cassandra

Tarball installations

<installation_location>/conf

Docker installations

/etc/cassandra

Before starting the newly-installed version of Cassandra on an upgraded cluster node, you first need to update the default configuration files that came with the new version to ensure your desired configuration is persisted.

For instructions on how to update Cassandra’s configuration files as part of the upgrade process, see Update Cassandra configuration files.

Cassandra configuration file (cassandra.yaml)

The following additions, removals, and changes have been made to cassandra.yaml version 4.x.

allocate_tokens_for_local_replication_factor

This setting offers a simpler and safer approach than allocate_tokens_for_keyspace. Important to set when using num_tokens < 256. See Replica-aware token allocation.

snapshot_links_per_second

The act of creating or clearing snapshots involves creating or removing potentially tens of thousands of links, which can cause significant performance impact, especially on consumer-grade SSDs. A non-zero value here can be used to throttle these links to avoid the negative performance impact of taking and clearing snapshots.

For more information, refer to CASSANDRA-13019.

native_transport_max_concurrent_requests_in_bytes & native_transport_max_concurrent_requests_in_bytes_per_ip

Provide the ability to change the global and per-endpoint limits on the number of in-flight requests setting without restarting Cassandra. These are hidden settings, not listed in cassandra.yaml. See Throttling client throughput.

network_authorizer

See Authorization by datacenter.

commitlog_sync: group

Adds a group mode where writes don’t trigger commitlog flushes, but still waits for acknowledgement by the commitlog flush which happens at minimum every commitlog_sync_group_window.

periodic_commitlog_sync_lag_block

How often the commitlog in periodic mode flushes to disk.

flush_compression

The compression algorithm to apply to flushed SSTables. Flushes need to happen fast to not block the flush writer and memtables, and write smaller SSTables than compactions — so choosing a fast algorithm has value. Choices are none (no compression), fast, and table (as configured by the table’s schema).

repair_session_space

Replaces repair_session_max_tree_depth, providing a more intuitive configuration to define.

native_transport_allow_older_protocols

Whether older protocol versions are allowed. It defaults to false but must be set to true when upgrading a cluster.

native_transport_idle_timeout

Controls when idle client connections are closed.

concurrent_validations

Number of simultaneous repair validations to allow. Useful to ensure concurrent_compactors are reserved for non-repair compactions.

concurrent_materialized_view_builders

Number of simultaneous materialized view builder tasks to allow.

stream_entire_sstables

See Zero Copy Streams.

internode_tcp_* & internode_application_*

Defensive settings for protecting Cassandra from network partitions. Part of the new messaging system.

streaming_connections_per_host

See Zero Copy Streams.

legacy_ssl_storage_port_enabled

If enabled, will open up an encrypted listening socket on ssl_storage_port.

If your cluster utilizes server-to-server internode encryption, you must set legacy_ssl_storage_port_enabled to true when upgrading to Cassandra 4.x. Otherwise, Cassandra 4.x will use the default port 7000 for both encrypted and unencrypted messages, which can cause communication issues with un-upgraded Cassandra 3.x nodes.

This parameter can be removed after all cluster nodes have been upgraded to Cassandra 4.x.

ideal_consistency_level

Define the ideal consistency_level that writes are expected to work within the timeout window. This can include the asynchronous writes above the request statement’s specified consistency level. Metrics are provided on how many writes meet this ideal_consistency_level.

automatic_sstable_upgrade & max_concurrent_automatic_sstable_upgrades

Enable the automatic upgrade of SSTables, and limit the number of concurrent SSTable upgrades. See Automatic SSTable upgrading.

audit_logging_options

See Auditing.

full_query_logging_options

See Full Query Logging (FQL).

corrupted_tombstone_strategy

Validate tombstones on reads and compaction, can be either disabled, warn, or exception.

diagnostic_events_enabled

Use to enable diagnostic_events. Diagnostic events are for observation purposes, but can introduce performance penalties if not used accurately and wisely.

native_transport_flush_in_batches_legacy

Use native transport TCP message coalescing. Worth testing on older kernels with fewer client connections available.

repaired_data_tracking_for_* & report_unconfirmed_repaired_data_mismatches

Enable tracking of repaired state of data during reads.

internode_timeouts

Now defaults to true. The current expectation is that everyone knows the importance of keeping servers in sync with NTP.

materialized_views_enabled

Now defaults to false. Materialized views is an experimental feature that requires some developer understanding to safely use.

sasi_indexes_enabled

Now defaults to false. SASI is an experimental feature that requires some developer understanding to safely use.

transient_replication_enabled

See Transient replication.

local_system_data_file_directory

Directory where Cassandra should store the data of the local system keyspaces. By default Cassandra will store the data of the local system keyspaces in the first of the data directories specified by data_file_directories. This approach ensures that if one of the other disks is lost Cassandra can continue to operate. For extra security this setting allows to store those data on a different directory that provides redundancy.

networking_cache_size

Maximum memory to use for inter-node and client-server networking buffers.

Defaults to the smaller of 1/16 of heap or 128MB. This pool is allocated off-heap, so is in addition to the memory allocated for heap. The cache also has on-heap overhead which is roughly 128 bytes per chunk (i.e. 0.2% of the reserved size if the default 64k chunk size is used). Memory is only allocated when needed.

file_cache_enabled

Enable the sstable chunk cache. The chunk cache will store recently accessed sections of the sstable in-memory as uncompressed buffers.

tables_warn_threshold & keyspaces_warn_threshold

Having many tables and/or keyspaces negatively affects performance of many operations in the cluster. When the number of tables/keyspaces in the cluster exceeds the following thresholds a client warning will be sent back to the user when creating a table or keyspace.

Auditing

Auditing is designed to meet regulatory and security requirements, and is able to audit more types of requests and interactions with the database than Full Query Logging (FQL) or Change Data Capture (CDC) can.

Auditing and FQL have been implemented on a shared design that uses the OpenHFT libraries. By default the chronicle queue library of OpenHFT is used as a binary log (binlog), making the default behavior file-based.

The implementation is file (binlog) based. Different OpenHFT plug-in libraries can be used, the choice of which will influence operability and performance.

There are performance implications involved when enabling Auditing. This depends on the auditing setup and configuration. You should perform testing and benchmarking to evaluate production implications.

Auditing is designed to meet regulatory and security requirements, and is able to audit more types of requests and interactions with the database than Full Query Logging (FQL) or Change Data Capture (CDC) can. Because the use of Auditing may be a hard requirement for many, it’s recommended to perform thorough testing and benchmarking prior to deployment in production.

Auditing will still have a performance impact on production clusters, and file-based logging needs to be configured to be operationally safe and sustainable. Ensure appropriate metrics monitoring and alerts are in place around the log files. To learn more about this feature, read this blog post.

Authorization by datacenter

Prior to Cassandra 4.0, authorization via user or role was all or nothing with no concept of datacenters. Starting in version 4.0, Cassandra operators can permit and restrict connections from client applications to specific datacenters.

Authorization by datacenter utilizes the existing Authorization feature in Cassandra, and its concept of Roles. This only prevents the connections to coordinators in disallowed datacenters. Data from remote datacenters can still be accessed through a local coordinator. Server-side control of connections to coordinators is important to prevent saturation of network and available connections. This can lead to client applications being unable to connect to the cluster. This is a common problem with analytics datacenters where clients, like Apache Spark, can spawn hundreds or thousands of connections.

The use of datacenter authorization is recommended when there is a need to control and limit connections to coordinators in specific data centers. This is considered to be a safe feature to use immediately where there is benefit, as it touches very few components of the Cassandra codebase. Though, it’s not recommended to rely on this feature to restrict access to data only replicated to certain datacenters. To learn more about this feature, read this blog post.

Automatic SSTable upgrading

Starting in version 4.0, the cassandra.yaml configuration file contains a new setting to automatically upgrade SSTables to the latest format after upgrading Cassandra: automatic_sstable_upgrade. It provides an alternative to manually running the upgrade command via nodetool This feature is optional and disabled by default.

SSTable data must be upgraded as part of any upgrade from Cassandra 3.x to 4.x, as the table format has changed. When the automatic_sstable_upgrade setting is enabled and set to true, a Cassandra node will automatically start the process to convert SSTables from the 3.x format to the 4.x format. This process begins after starting Cassandra 4.x for the first time on an upgraded node.

While this feature provides a convenience when upgrading nodes in the cluster, care needs to be taken when using it. Operators of a Cassandra cluster trade off control over the execution of the process for the convenience of never having to manually execute it.

SSTables are stored in sorted order, which should keep the CPU and disk IO usage relatively low. However, too many concurrent executions of the SSTable upgrade process may overwhelm the cluster. If this feature is enabled, upgrading a node may be delayed until the SSTable upgrade on other nodes completes.

Leaving this feature is disabled allows the SSTable format upgrade to be performed in a controlled manner. Specifically, after upgrading all nodes in the cluster to version 4.x, SSTable upgrade can be manually started on nodes individually. This allows for a scenario where the SSTable upgrade is launched based on the health of the cluster. If the cluster is under no load, multiple executions can be started. If the cluster is under load, then it can be run on each node in turn (one node at a time).

It’s generally recommended to leave this feature disabled and to upgrade the SSTables of a node manually.

Full Query Logging (FQL)

Full Query Logging (FQL) logs all requests to the CQL interface. FQL can be used for debugging, performance benchmarking, testing, and also for auditing CQL queries. In comparison to audit logging, FQL is dedicated to CQL requests only and comes with features such as the FQL Replay and FQL Compare that are unavailable in audit logging.

Full Query Logging’s primary use is for correctness testing. Use Auditing for security, business accounting, and monetary transaction compliance. Use Diagnostics Events for debugging. Use Change Data Capture (CDC) for runtime duplication/exporting/streaming of the data.

FQL only logs the queries that successfully complete.

FQL appears to have little or no overhead in WRITE only workloads, and a minor overhead in MIXED workloads.

The implementation is file (binlog) based. Different OpenHFT plug-in libraries can be used, which will influence operability and performance.

Minor performance changes are to be expected, and will depend on the setup and configuration. You should perform testing and benchmarking to evaluate production implications. File-based logging needs to be configured to be operational safe and sustainable. Ensure appropriate metrics monitoring and alerts are in place around the log files.

FQL is intended for correctness testing its use will add quality assurance to existing clusters.

Replica-aware token allocation

Starting in version 4.0, the cassandra.yaml configuration file contains a new setting to configure the replica-aware token allocation algorithm: allocate_tokens_for_local_replication_factor. It’s an alternative to the existing allocate_tokens_for_keyspace setting.

The following is the configuration setting and value that was used in Cassandra 3.11.x to configure the replica-aware token allocation algorithm:

allocate_tokens_for_keyspace: dummy_rf

The replication factor of the keyspace defined in the allocate_tokens_for_keyspace setting is used by the replica-aware token allocator when generating tokens for a new node. No other keyspace information is used. Providing a keyspace for this feature introduced additional overhead and complexities when using the feature in a new cluster.

Starting in version 4.0, the newly-added allocate_tokens_for_local_replication_factor setting can be used as an alternative means of specifying the desired replication factor to the replica-aware token allocator. It has a default value of 3 and reduces the complexities and overhead associated with using replica-aware token allocation.

Instead of keeping the allocate_tokens_for_keyspace setting and value, it should be removed or commented out in the configuration file when upgrading to Cassandra 4.x:

# allocate_tokens_for_keyspace: dummy_rf

This change should be done as part of Update Cassandra configuration files.

Thrift protocol removed

The Thrift protocol has been removed starting in Cassandra 4.0. With the removal of the Thrift protocol, the following settings have also been removed from cassandra.yaml:

  • start_rpc

  • rpc_port

  • rpc_server_type

  • thrift_framed_transport_size_in_mb

The above settings no longer appear in the default configuration as of version 4.0, and you must never add them back in cassandra.yaml, otherwise Cassandra will fail to start.

It’s recommended that clients currently using the Thrift protocol change to using the Native Binary protocol.

Throttling client throughput

In Cassandra versions 3.0.19, 3.11.5, and 4.0, two new settings are configurable via cassandra.yaml:

  • native_transport_max_concurrent_requests_in_bytes

  • native_transport_max_concurrent_requests_in_bytes_per_ip

These are hidden settings and not listed in cassandra.yaml. They have dynamic default values:

native_transport_max_concurrent_requests_in_bytes = `heap / 10`
native_transport_max_concurrent_requests_in_bytes_per_ip = `heap / 40`

Clients that send requests over these thresholds will receive ​​OverloadedException error messages.

It’s recommended that you leave the default settings in place, and only tune them when needed.

These settings are exposed via the following Storage Service MBeans which allow them to be modified via JMX:

  • NativeTransportMaxConcurrentRequestsInBytes

  • NativeTransportMaxConcurrentRequestsInBytesPerIp

For more information, refer to CASSANDRA-15519.

Transient replication

Transient replication takes advantage of incremental repairs, allowing consistency levels and storage requirements to be reduced when data is marked as repaired.

Transient replication introduces new terminology for each replica. Replicas are now either full replicas or transient replicas. Full replicas, like replicas previously, store all data. Transient replicas store only unrepaired data.

When Cassandra 4.0 was released, transient replication was explicitly announced as an experimental feature. At the time of its release, transient replication couldn’t be used for:

  • Monotonic Reads

  • Lightweight Transactions (LWTs)

  • Logged Batches

  • Counters

  • Keyspaces using materialized views

  • Secondary indexes (2i)

  • Tables with CDC enabled

Transient replication introduces operational overhead when it comes to changing replica factor (RF).

  • RF cannot be altered while some endpoints are not in a normal state (no range movements).

  • You can’t add full replicas if there are any transient replicas. You must first remove all transient replicas, then change the # of full replicas, then add back the transient replicas.

  • You can only safely increase the number of transients one at a time with incremental repair run in between each time.

Transient Replication offers the potential of reducing network traffic and storage requirements, e.g.volume sizes and/or number of nodes. However, due to its experimental status, any reliance on this feature should be accompanied with committed involvement, resources, and testing.

Zero Copy Streams

Cassandra 4.0 introduces hardware-bound Zero Copy Streaming (ZCS). Streaming is a fundamental operation to many of Cassandra’s internal operations: bootstrapping, decommissioning, host replacement, repairs, rebuilds, etc. Prior to this feature, all data was streamed through the JVMs of the source and destination nodes, making it a slow and garbage collection-intensive process. With hardware-bound streaming, data can be transferred directly from disk to network avoiding the JVM.

ZCS is enabled by setting stream_entire_sstables: true in cassandra.yaml.

Throttling still occurs according to the stream_throughput_outbound setting.

ZCS will only be used when an SSTable contains only partitions that have been requested to be streamed. In practice this means it only works when vnodes are disabled (i.e. num_tokens=1). That is, if an SSTable candidate contains partitions that are additional to those of the requested streaming operation, it will be streamed via the traditional JVM streaming implementation.

ZCS is also unable to be used for any SSTables that contain legacy counters (from Cassandra versions before 2.1).

To summarize, for ZCS to work on a particular SSTable, the following checks must pass:

  1. The stream_entire_sstables setting is true.

  2. No counter legacy shards in SSTable.

  3. The SSTable must contain data belonging to only the token ranges specified in the streaming request.

Activation and frequency of ZCS can be observed in the debug.log file, if it is enabled. The output will be from the CassandraEntireSSTableStreamReader logger.

ZCS provides improved availability and elasticity by reducing the time it takes to stream token range replicas between nodes during bootstrapping, decommissioning, and host replacement operations. ZCS provides improved performance by reducing the load and latency impact streaming large amounts of data through each JVM causes.

But ZCS only works for clusters not using vnodes, i.e. where num_tokens=1.

There are a number of other important streaming improvements introduced in version 4.0, such as Netty non-blocking NIO messaging, parallel keyspace streams, and better stream balancing during bootstrap. These improvements will improve bootstrapping times and overall elasticity of a Cassandra cluster.

To enable parallel streams, increase the value of streaming_connections_per_host in cassandra.yaml. This will help when streams are receiver CPU bound.

ZCS provides no benefits on vnode clusters. Since there is no negative impact, it’s generally recommended to leave ZCS enabled.

JVM options changes

The jvm.options file is replaced in Cassandra 4.x by three files: jvm-server.options, jvm8-server.options, and jvm11-server.options. These files control the behavior of the Cassandra JVM.

Another new set of files — jvm-client.options, jvm8-client.options and jvm11-client.options — have been added to configure the JVM of auxiliary tools, such as nodetool and SSTable tools.

jvm-server.options & jvm-client.options

These files contain static flags that are used by any version of the JVM. All flags defined in these files will be used by Cassandra when the JVM is started.

jvm8-server.options & jvm8-client.options

These files contain flags specific to Java 8.

jvm11-server.options & jvm11-client.options

These files contain flags specific to Java 11.

The version of Java used to run Cassandra determines which of the version-specific .options files are used. For example, if Java 11 is used to run Cassandra, then jvm11-server.options and jvm11-client.options are used.

The cassandra-env.sh file is still relevant and sourced across the .options files. However, it does not reflect the server-client split. For example, placing a G1-specific flags to cassandra-env.sh, enabling G1 in jvm-server.options but keeping the default CMS in jvm-client.options will prevent the client JVMs from working correctly.

It’s recommended that you rewrite your current jvm.options file into jvm-server.options, jvm-client.options, jvm8-server.options, and jvm8-client.options when updating the Cassandra configuration files during upgrade. Validate carefully that new nodes start up with the intended JVM options.

cassandra-rackdc.properties file

Ec2Snitch supports standard AWS naming conventions

Starting in Cassandra 4.0, the cassandra-rackdc.properties file contains the new ec2_naming_scheme setting that allows the Ec2Snitch and Ec2MultiRegionSnitch to switch between legacy and AWS-style naming conventions.

The possible values are legacy and standard (default). In Cassandra 4.x, standard is the default naming convention used. To use the pre-4.x naming convention, this setting must be set to legacy, i.e. you must use the legacy value if you are upgrading a pre-4.0 cluster.

Prior Cassandra 4.0, the Ec2Snitch and Ec2MultiRegionSnitch used datacenter and rack naming conventions inconsistent with those presented in Amazon EC2 APIs. Full AWS-style naming was introduced in Cassandra 4.0 (CASSANDRA-7839) and made the default naming convention for Ec2Snitch and Ec2MultiRegionSnitch.

As part of this change, the ability to control the naming convention was added. This was done using the ec2_naming_scheme setting in the cassandra-rackdc.properties file. The possible values for this setting are legacy and standard.

If the endpoint_snitch setting in cassandra.yaml is set to either Ec2Snitch or Ec2MultiRegionSnitch, then the ec2_naming_scheme setting in the cassandra-rackdc.properties file must be set to legacy when upgrading from Cassandra 3.x to 4.x.

legacy

Sets the Ec2Snitch and Ec2MultiRegionSnitch to use the pre-4.0 name convention which is inconsistent with those presented in Amazon EC2 APIs. Specifically, the Datacenter name is set to the part of the Region name preceding the last "-". If the Region ends in a value of "-1", it’s omitted from the Datacenter name. If the Region ends in a value other than "-1", it’s included in the Datacenter name. The Rack name is the number and letter portion of the Availability Zone name following the last "-".

For example:

Region Availability Zone Datacenter Rack

us-west-1

us-west-1a

us-west

1a

us-west-2

us-west-2b

us-west-2

2b

standard

Sets the Ec2Snitch and Ec2MultiRegionSnitch to use the new naming convention introduced in Cassandra 4.0. This follows the full AWS-style naming and is the default value in Cassandra 4.x.

Specifically, the Datacenter name is the standard AWS Region name, including the number. The Rack name is the full Availability Zone name.

For example:

Region Availability Zone Datacenter Rack

us-west-1

us-west-1a

us-west-1

us-west-1a

us-west-2

us-west-2b

us-west-2

us-west-2b

OldNetworkTopologyStrategy removed

The deprecated OldNetworkTopologyStrategy has been removed starting in Cassandra 4.0.

Before upgrading, you need to update all keyspaces using OldNetworkTopologyStrategy by changing them to use NetworkTopologyStrategy.

New CQL data types

Positive and negative NAN and INFINITY

The existing NAN and INFINITY data types are replaced with both positive and negative variants.

New CQL table options

additional_write_policy

How transient replication writes handle the transient replica.

read_repair (replica filtering protection)

Introduces a configurable tradeoff between monotonic quorum reads and partition-level write atomicity.

compression_level

Allows defining compression level to the compressor.

  • 0: fastest

  • 9: smallest

Hybrid speculative_retry

Allows specifying MIN() and MAX() around two types of speculative_retry values, providing bounds that ensure useful values even with one down or misbehaving node.

Windows support removed

Windows support is removed from Cassandra 4.0 onward.

Developers who use Windows can continue to run Apache Cassandra locally using Windows Subsystem for Linux version 2 (WSL2), Docker for Windows, and virtualization platforms like Hyper-V and VirtualBox.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com