Important configuration changes in Cassandra 4.x
This page provides a reference of various changes made in the Apache Cassandra® configuration files in version 4.x.
Java Options: jvm.options, jvm-server.options, jvm8-server.options, jvm11-server.options
Replica Assignments: cassandra-rackdc.properties
Where to find Cassandra configuration files
The location of Cassandra’s configuration files varies, depending on the type of installation:
Package installations (APT, YUM, etc.)
Before starting the newly-installed version of Cassandra on an upgraded cluster node, you first need to update the default configuration files that came with the new version to ensure your desired configuration is persisted.
For instructions on how to update Cassandra’s configuration files as part of the upgrade process, see Update Cassandra configuration files.
The following additions, removals, and changes have been made to cassandra.yaml version 4.x.
This setting offers a simpler and safer approach than
allocate_tokens_for_keyspace. Important to set when using
num_tokens< 256. See Replica-aware token allocation.
The act of creating or clearing snapshots involves creating or removing potentially tens of thousands of links, which can cause significant performance impact, especially on consumer-grade SSDs. A non-zero value here can be used to throttle these links to avoid the negative performance impact of taking and clearing snapshots.
For more information, refer to CASSANDRA-13019.
Provide the ability to change the global and per-endpoint limits on the number of in-flight requests setting without restarting Cassandra. These are hidden settings, not listed in cassandra.yaml. See Throttling client throughput.
Adds a group mode where writes don’t trigger commitlog flushes, but still waits for acknowledgement by the commitlog flush which happens at minimum every
How often the commitlog in periodic mode flushes to disk.
The compression algorithm to apply to flushed SSTables. Flushes need to happen fast to not block the flush writer and memtables, and write smaller SSTables than compactions — so choosing a fast algorithm has value. Choices are
table(as configured by the table’s schema).
repair_session_max_tree_depth, providing a more intuitive configuration to define.
Whether older protocol versions are allowed. It defaults to
falsebut must be set to
truewhen upgrading a cluster.
Controls when idle client connections are closed.
Number of simultaneous repair validations to allow. Useful to ensure
concurrent_compactorsare reserved for non-repair compactions.
Number of simultaneous materialized view builder tasks to allow.
See Zero Copy Streams.
Defensive settings for protecting Cassandra from network partitions. Part of the new messaging system.
See Zero Copy Streams.
If enabled, will open up an encrypted listening socket on
If your cluster utilizes server-to-server internode encryption, you must set
truewhen upgrading to Cassandra 4.x. Otherwise, Cassandra 4.x will use the default port 7000 for both encrypted and unencrypted messages, which can cause communication issues with un-upgraded Cassandra 3.x nodes.
This parameter can be removed after all cluster nodes have been upgraded to Cassandra 4.x.
Define the ideal
consistency_levelthat writes are expected to work within the timeout window. This can include the asynchronous writes above the request statement’s specified consistency level. Metrics are provided on how many writes meet this
Enable the automatic upgrade of SSTables, and limit the number of concurrent SSTable upgrades. See Automatic SSTable upgrading.
Validate tombstones on reads and compaction, can be either
Use to enable diagnostic_events. Diagnostic events are for observation purposes, but can introduce performance penalties if not used accurately and wisely.
Use native transport TCP message coalescing. Worth testing on older kernels with fewer client connections available.
Enable tracking of repaired state of data during reads.
Now defaults to
true. it’s expected that everyone knows today the importance of keeping servers in sync with NTP.
Now defaults to
false. Materialized views is an experimental feature that requires some developer understanding to safely use.
Now defaults to
false. SASI is an experimental feature that requires some developer understanding to safely use.
Directory where Cassandra should store the data of the local system keyspaces. By default Cassandra will store the data of the local system keyspaces in the first of the data directories specified by
data_file_directories. This approach ensures that if one of the other disks is lost Cassandra can continue to operate. For extra security this setting allows to store those data on a different directory that provides redundancy.
Maximum memory to use for inter-node and client-server networking buffers.
Defaults to the smaller of 1/16 of heap or 128MB. This pool is allocated off-heap, so is in addition to the memory allocated for heap. The cache also has on-heap overhead which is roughly 128 bytes per chunk (i.e. 0.2% of the reserved size if the default 64k chunk size is used). Memory is only allocated when needed.
Enable the sstable chunk cache. The chunk cache will store recently accessed sections of the sstable in-memory as uncompressed buffers.
Having many tables and/or keyspaces negatively affects performance of many operations in the cluster. When the number of tables/keyspaces in the cluster exceeds the following thresholds a client warning will be sent back to the user when creating a table or keyspace.
Auditing is designed to meet regulatory and security requirements, and is able to audit more types of requests and interactions with the database than Full Query Logging (FQL) or Change Data Capture (CDC) can.
Auditing and FQL have been implemented on a shared design that uses the OpenHFT libraries. By default the chronicle queue library of OpenHFT is used as a binary log (binlog), making the default behavior file-based.
The implementation is file (binlog) based. Different OpenHFT plug-in libraries can be used, the choice of which will influence operability and performance.
There are performance implications involved when enabling Auditing. This depends on the auditing setup and configuration. You should perform testing and benchmarking to evaluate production implications.
Auditing is designed to meet regulatory and security requirements, and is able to audit more types of requests and interactions with the database than Full Query Logging (FQL) or Change Data Capture (CDC) can. Because the use of Auditing may be a hard requirement for many, it’s recommended to perform thorough testing and benchmarking prior to deployment in production.
Auditing will still have a performance impact on production clusters, and file-based logging needs to be configured to be operationally safe and sustainable. Ensure appropriate metrics monitoring and alerts are in place around the log files. To learn more about this feature, read this blog post.
Prior to Cassandra 4.0, authorization via user or role was all or nothing with no concept of datacenters. Starting in version 4.0, Cassandra operators can permit and restrict connections from client applications to specific datacenters.
Authorization by datacenter utilizes the existing Authorization feature in Cassandra, and its concept of Roles. This only prevents the connections to coordinators in disallowed datacenters. Data from remote datacenters can still be accessed through a local coordinator. Server-side control of connections to coordinators is important to prevent saturation of network and available connections. This can lead to client applications being unable to connect to the cluster. This is a common problem with analytics datacenters where clients, like Apache Spark, can spawn hundreds or thousands of connections.
The use of datacenter authorization is recommended when there is a need to control and limit connections to coordinators in specific data centers. This is considered to be a safe feature to use immediately where there is benefit, as it touches very few components of the Cassandra codebase. Though, it’s not recommended to rely on this feature to restrict access to data only replicated to certain datacenters. To learn more about this feature, read this blog post.
Starting in version 4.0, the cassandra.yaml configuration file contains a new setting to automatically upgrade SSTables to the latest format after upgrading Cassandra:
It provides an alternative to manually running the upgrade command via
This feature is optional and disabled by default.
SSTable data must be upgraded as part of any upgrade from Cassandra 3.x to 4.x, as the table format has changed.
automatic_sstable_upgrade setting is enabled and set to
true, a Cassandra node will automatically start the process to convert SSTables from the 3.x format to the 4.x format.
This process begins after starting Cassandra 4.x for the first time on an upgraded node.
While this feature provides a convenience when upgrading nodes in the cluster, care needs to be taken when using it. Operators of a Cassandra cluster trade off control over the execution of the process for the convenience of never having to manually execute it.
SSTables are stored in sorted order, which should keep the CPU and disk IO usage relatively low. However, too many concurrent executions of the SSTable upgrade process may overwhelm the cluster. If this feature is enabled, upgrading a node may be delayed until the SSTable upgrade on other nodes completes.
Leaving this feature is disabled allows the SSTable format upgrade to be performed in a controlled manner. Specifically, after upgrading all nodes in the cluster to version 4.x, SSTable upgrade can be manually started on nodes individually. This allows for a scenario where the SSTable upgrade is launched based on the health of the cluster. If the cluster is under no load, multiple executions can be started. If the cluster is under load, then it can be run on each node in turn (one node at a time).
It’s generally recommended to leave this feature disabled and to upgrade the SSTables of a node manually.
Full Query Logging (FQL) logs all requests to the CQL interface. FQL can be used for debugging, performance benchmarking, testing, and also for auditing CQL queries. In comparison to audit logging, FQL is dedicated to CQL requests only and comes with features such as the FQL Replay and FQL Compare that are unavailable in audit logging.
Full Query Logging’s primary use is for correctness testing. Use Auditing for security, business accounting, and monetary transaction compliance. Use Diagnostics Events for debugging. Use Change Data Capture (CDC) for runtime duplication/exporting/streaming of the data.
FQL only logs the queries that successfully complete.
FQL appears to have little or no overhead in WRITE only workloads, and a minor overhead in MIXED workloads.
The implementation is file (binlog) based. Different OpenHFT plug-in libraries can be used, which will influence operability and performance.
Minor performance changes are to be expected, and will depend on the setup and configuration. You should perform testing and benchmarking to evaluate production implications. File-based logging needs to be configured to be operational safe and sustainable. Ensure appropriate metrics monitoring and alerts are in place around the log files.
FQL is intended for correctness testing its use will add quality assurance to existing clusters.
Starting in version 4.0, the cassandra.yaml configuration file contains a new setting to configure the replica-aware token allocation algorithm:
It’s an alternative to the existing
The following is the configuration setting and value that was used in Cassandra 3.11.x to configure the replica-aware token allocation algorithm:
The replication factor of the keyspace defined in the
allocate_tokens_for_keyspace setting is used by the replica-aware token allocator when generating tokens for a new node.
No other keyspace information is used.
Providing a keyspace for this feature introduced additional overhead and complexities when using the feature in a new cluster.
Starting in version 4.0, the newly-added
allocate_tokens_for_local_replication_factor setting can be used as an alternative means of specifying the desired replication factor to the replica-aware token allocator.
It has a default value of
3 and reduces the complexities and overhead associated with using replica-aware token allocation.
Instead of keeping the
allocate_tokens_for_keyspace setting and value, it should be removed or commented out in the configuration file when upgrading to Cassandra 4.x:
# allocate_tokens_for_keyspace: dummy_rf
This change should be done as part of Update Cassandra configuration files.
The Thrift protocol has been removed starting in Cassandra 4.0. With the removal of the Thrift protocol, the following settings have also been removed from cassandra.yaml:
The above settings no longer appear in the default configuration as of version 4.0, and you must never add them back in cassandra.yaml, otherwise Cassandra will fail to start.
It’s recommended that clients currently using the Thrift protocol change to using the Native Binary protocol.
In Cassandra versions 3.0.19, 3.11.5, and 4.0, two new settings are configurable via cassandra.yaml:
These are hidden settings and not listed in cassandra.yaml. They have dynamic default values:
native_transport_max_concurrent_requests_in_bytes = `heap / 10` native_transport_max_concurrent_requests_in_bytes_per_ip = `heap / 40`
Clients that send requests over these thresholds will receive
OverloadedException error messages.
It’s recommended that you leave the default settings in place, and only tune them when needed.
These settings are exposed via the following Storage Service MBeans which allow them to be modified via JMX:
For more information, refer to CASSANDRA-15519.
Transient replication takes advantage of incremental repairs, allowing consistency levels and storage requirements to be reduced when data is marked as repaired.
Transient replication introduces new terminology for each replica. Replicas are now either full replicas or transient replicas. Full replicas, like replicas previously, store all data. Transient replicas store only unrepaired data.
When Cassandra 4.0 was released, transient replication was explicitly announced as an experimental feature. At the time of its release, transient replication couldn’t be used for:
Lightweight Transactions (LWTs)
Keyspaces using materialized views
Secondary indexes (2i)
Tables with CDC enabled
Transient replication introduces operational overhead when it comes to changing replica factor (RF).
RF cannot be altered while some endpoints are not in a normal state (no range movements).
You can’t add full replicas if there are any transient replicas. You must first remove all transient replicas, then change the # of full replicas, then add back the transient replicas.
You can only safely increase the number of transients one at a time with incremental repair run in between each time.
Transient Replication offers the potential of reducing network traffic and storage requirements, e.g.volume sizes and/or number of nodes. However, due to its experimental status, any reliance on this feature should be accompanied with committed involvement, resources, and testing.
Cassandra 4.0 introduces hardware-bound Zero Copy Streaming (ZCS). Streaming is a fundamental operation to many of Cassandra’s internal operations: bootstrapping, decommissioning, host replacement, repairs, rebuilds, etc. Prior to this feature, all data was streamed through the JVMs of the source and destination nodes, making it a slow and garbage collection-intensive process. With hardware-bound streaming, data can be transferred directly from disk to network avoiding the JVM.
ZCS is enabled by setting
stream_entire_sstables: true in cassandra.yaml.
Throttling still occurs according to the
ZCS will only be used when an SSTable contains only partitions that have been requested to be streamed.
In practice this means it only works when vnodes are disabled (i.e.
That is, if an SSTable candidate contains partitions that are additional to those of the requested streaming operation, it will be streamed via the traditional JVM streaming implementation.
ZCS is also unable to be used for any SSTables that contain legacy counters (from Cassandra versions before 2.1).
To summarize, for ZCS to work on a particular SSTable, the following checks must pass:
No counter legacy shards in SSTable.
The SSTable must contain data belonging to only the token ranges specified in the streaming request.
Activation and frequency of ZCS can be observed in the
debug.log file, if it is enabled.
The output will be from the
ZCS provides improved availability and elasticity by reducing the time it takes to stream token range replicas between nodes during bootstrapping, decommissioning, and host replacement operations. ZCS provides improved performance by reducing the load and latency impact streaming large amounts of data through each JVM causes.
But ZCS only works for clusters not using vnodes, i.e. where
There are a number of other important streaming improvements introduced in version 4.0, such as Netty non-blocking NIO messaging, parallel keyspace streams, and better stream balancing during bootstrap. These improvements will improve bootstrapping times and overall elasticity of a Cassandra cluster.
To enable parallel streams, increase the value of
streaming_connections_per_host in cassandra.yaml.
This will help when streams are receiver CPU bound.
ZCS provides no benefits on vnode clusters. Since there is no negative impact, it’s generally recommended to leave ZCS enabled.
The jvm.options file is replaced in Cassandra 4.x by three files: jvm-server.options, jvm8-server.options, and jvm11-server.options. These files control the behavior of the Cassandra JVM.
Another new set of files — jvm-client.options, jvm8-client.options and jvm11-client.options — have been added to configure the JVM of auxiliary tools, such as
nodetool and SSTable tools.
- jvm-server.options & jvm-client.options
These files contain static flags that are used by any version of the JVM. All flags defined in these files will be used by Cassandra when the JVM is started.
- jvm8-server.options & jvm8-client.options
These files contain flags specific to Java 8.
- jvm11-server.options & jvm11-client.options
These files contain flags specific to Java 11.
The version of Java used to run Cassandra determines which of the version-specific
.options files are used.
For example, if Java 11 is used to run Cassandra, then jvm11-server.options and jvm11-client.options are used.
The cassandra-env.sh file is still relevant and sourced across the
However, it does not reflect the server-client split.
For example, placing a G1-specific flags to cassandra-env.sh, enabling G1 in jvm-server.options but keeping the default CMS in jvm-client.options will prevent the client JVMs from working correctly.
It’s recommended that you rewrite your current jvm.options file into jvm-server.options, jvm-client.options, jvm8-server.options, and jvm8-client.options when updating the Cassandra configuration files during upgrade. Validate carefully that new nodes start up with the intended JVM options.
Starting in Cassandra 4.0, the cassandra-rackdc.properties file contains the new
ec2_naming_scheme setting that allows the
Ec2MultiRegionSnitch to switch between legacy and AWS-style naming conventions.
The possible values are
In Cassandra 4.x,
standard is the default naming convention used.
To use the pre-4.x naming convention, this setting must be set to
legacy, i.e. you must use the
legacy value if you are upgrading a pre-4.0 cluster.
Prior Cassandra 4.0, the
Ec2MultiRegionSnitch used datacenter and rack naming conventions inconsistent with those presented in Amazon EC2 APIs.
Full AWS-style naming was introduced in Cassandra 4.0 (CASSANDRA-7839) and made the default naming convention for
As part of this change, the ability to control the naming convention was added.
This was done using the
ec2_naming_scheme setting in the cassandra-rackdc.properties file.
The possible values for this setting are
endpoint_snitch setting in cassandra.yaml is set to either
Ec2MultiRegionSnitch, then the
ec2_naming_scheme setting in the cassandra-rackdc.properties file must be set to
legacy when upgrading from Cassandra 3.x to 4.x.
Ec2MultiRegionSnitchto use the pre-4.0 name convention which is inconsistent with those presented in Amazon EC2 APIs. Specifically, the Datacenter name is set to the part of the Region name preceding the last "-". If the Region ends in a value of "-1", it’s omitted from the Datacenter name. If the Region ends in a value other than "-1", it’s included in the Datacenter name. The Rack name is the number and letter portion of the Availability Zone name following the last "-".
Region Availability Zone Datacenter Rack
Ec2MultiRegionSnitchto use the new naming convention introduced in Cassandra 4.0. This follows the full AWS-style naming and is the default value in Cassandra 4.x.
Specifically, the Datacenter name is the standard AWS Region name, including the number. The Rack name is the full Availability Zone name.
Region Availability Zone Datacenter Rack
OldNetworkTopologyStrategy has been removed starting in Cassandra 4.0.
Before upgrading, you need to update all keyspaces using
OldNetworkTopologyStrategy by changing them to use
- Positive and negative NAN and INFINITY
The existing NAN and INFINITY data types are replaced with both positive and negative variants.
How transient replication writes handle the transient replica.
read_repair(replica filtering protection)
Introduces a configurable tradeoff between monotonic quorum reads and partition-level write atomicity.
Allows defining compression level to the compressor.
MAX()around two types of
speculative_retryvalues, providing bounds that ensure useful values even with one down or misbehaving node.
Windows support is removed from Cassandra 4.0 onward.
Developers who use Windows can continue to run Apache Cassandra locally using Windows Subsystem for Linux version 2 (WSL2), Docker for Windows, and virtualization platforms like Hyper-V and VirtualBox.