Important configuration changes in Cassandra 4.x
This page provides a reference of various changes made in the Apache Cassandra® configuration files in version 4.x.
-
Configuration: cassandra.yaml
-
Environment: cassandra-env.sh
-
Java Options: jvm.options, jvm-server.options, jvm8-server.options, jvm11-server.options
-
Replica Assignments: cassandra-rackdc.properties
Where to find Cassandra configuration files
The location of Cassandra’s configuration files varies, depending on the type of installation:
Installation type | Location |
---|---|
Package installations (APT, YUM, etc.) |
/etc/cassandra |
Tarball installations |
<installation_location>/conf |
Docker installations |
/etc/cassandra |
Before starting the newly-installed version of Cassandra on an upgraded cluster node, you first need to update the default configuration files that came with the new version to ensure your desired configuration is persisted. For instructions on how to update Cassandra’s configuration files as part of the upgrade process, see Update Cassandra configuration files. |
Cassandra configuration file (cassandra.yaml)
The following additions, removals, and changes have been made to cassandra.yaml version 4.x.
allocate_tokens_for_local_replication_factor
-
This setting offers a simpler and safer approach than
allocate_tokens_for_keyspace
. Important to set when usingnum_tokens
< 256. See Replica-aware token allocation. snapshot_links_per_second
-
The act of creating or clearing snapshots involves creating or removing potentially tens of thousands of links, which can cause significant performance impact, especially on consumer-grade SSDs. A non-zero value here can be used to throttle these links to avoid the negative performance impact of taking and clearing snapshots.
For more information, refer to CASSANDRA-13019.
native_transport_max_concurrent_requests_in_bytes
&native_transport_max_concurrent_requests_in_bytes_per_ip
-
Provide the ability to change the global and per-endpoint limits on the number of in-flight requests setting without restarting Cassandra. These are hidden settings, not listed in cassandra.yaml. See Throttling client throughput.
network_authorizer
commitlog_sync: group
-
Adds a group mode where writes don’t trigger commitlog flushes, but still waits for acknowledgement by the commitlog flush which happens at minimum every
commitlog_sync_group_window
. periodic_commitlog_sync_lag_block
-
How often the commitlog in periodic mode flushes to disk.
flush_compression
-
The compression algorithm to apply to flushed SSTables. Flushes need to happen fast to not block the flush writer and memtables, and write smaller SSTables than compactions — so choosing a fast algorithm has value. Choices are
none
(no compression),fast
, andtable
(as configured by the table’s schema). repair_session_space
-
Replaces
repair_session_max_tree_depth
, providing a more intuitive configuration to define. native_transport_allow_older_protocols
-
Whether older protocol versions are allowed. It defaults to
false
but must be set totrue
when upgrading a cluster. native_transport_idle_timeout
-
Controls when idle client connections are closed.
concurrent_validations
-
Number of simultaneous repair validations to allow. Useful to ensure
concurrent_compactors
are reserved for non-repair compactions. concurrent_materialized_view_builders
-
Number of simultaneous materialized view builder tasks to allow.
stream_entire_sstables
-
See Zero Copy Streams.
internode_tcp_*
&internode_application_*
-
Defensive settings for protecting Cassandra from network partitions. Part of the new messaging system.
streaming_connections_per_host
-
See Zero Copy Streams.
legacy_ssl_storage_port_enabled
-
If enabled, will open up an encrypted listening socket on
ssl_storage_port
.If your cluster utilizes server-to-server internode encryption, you must set
legacy_ssl_storage_port_enabled
totrue
when upgrading to Cassandra 4.x. Otherwise, Cassandra 4.x will use the default port 7000 for both encrypted and unencrypted messages, which can cause communication issues with un-upgraded Cassandra 3.x nodes.This parameter can be removed after all cluster nodes have been upgraded to Cassandra 4.x.
ideal_consistency_level
-
Define the ideal
consistency_level
that writes are expected to work within the timeout window. This can include the asynchronous writes above the request statement’s specified consistency level. Metrics are provided on how many writes meet thisideal_consistency_level
. automatic_sstable_upgrade
&max_concurrent_automatic_sstable_upgrades
-
Enable the automatic upgrade of SSTables, and limit the number of concurrent SSTable upgrades. See Automatic SSTable upgrading.
audit_logging_options
-
See Auditing.
full_query_logging_options
corrupted_tombstone_strategy
-
Validate tombstones on reads and compaction, can be either
disabled
,warn
, orexception
. diagnostic_events_enabled
-
Use to enable diagnostic_events. Diagnostic events are for observation purposes, but can introduce performance penalties if not used accurately and wisely.
native_transport_flush_in_batches_legacy
-
Use native transport TCP message coalescing. Worth testing on older kernels with fewer client connections available.
repaired_data_tracking_for_*
&report_unconfirmed_repaired_data_mismatches
-
Enable tracking of repaired state of data during reads.
internode_timeouts
-
Now defaults to
true
. The current expectation is that everyone knows the importance of keeping servers in sync with NTP. materialized_views_enabled
-
Now defaults to
false
. Materialized views is an experimental feature that requires some developer understanding to safely use. sasi_indexes_enabled
-
Now defaults to
false
. SASI is an experimental feature that requires some developer understanding to safely use. transient_replication_enabled
local_system_data_file_directory
-
Directory where Cassandra should store the data of the local system keyspaces. By default Cassandra will store the data of the local system keyspaces in the first of the data directories specified by
data_file_directories
. This approach ensures that if one of the other disks is lost Cassandra can continue to operate. For extra security this setting allows to store those data on a different directory that provides redundancy. networking_cache_size
-
Maximum memory to use for inter-node and client-server networking buffers.
Defaults to the smaller of 1/16 of heap or 128MB. This pool is allocated off-heap, so is in addition to the memory allocated for heap. The cache also has on-heap overhead which is roughly 128 bytes per chunk (i.e. 0.2% of the reserved size if the default 64k chunk size is used). Memory is only allocated when needed.
file_cache_enabled
-
Enable the sstable chunk cache. The chunk cache will store recently accessed sections of the sstable in-memory as uncompressed buffers.
tables_warn_threshold
&keyspaces_warn_threshold
-
Having many tables and/or keyspaces negatively affects performance of many operations in the cluster. When the number of tables/keyspaces in the cluster exceeds the following thresholds a client warning will be sent back to the user when creating a table or keyspace.
Auditing
Auditing is designed to meet regulatory and security requirements, and is able to audit more types of requests and interactions with the database than Full Query Logging (FQL) or Change Data Capture (CDC) can.
Auditing and FQL have been implemented on a shared design that uses the OpenHFT libraries. By default the chronicle queue library of OpenHFT is used as a binary log (binlog), making the default behavior file-based.
The implementation is file (binlog) based. Different OpenHFT plug-in libraries can be used, the choice of which will influence operability and performance.
There are performance implications involved when enabling Auditing. This depends on the auditing setup and configuration. You should perform testing and benchmarking to evaluate production implications.
Auditing is designed to meet regulatory and security requirements, and is able to audit more types of requests and interactions with the database than Full Query Logging (FQL) or Change Data Capture (CDC) can. Because the use of Auditing may be a hard requirement for many, it’s recommended to perform thorough testing and benchmarking prior to deployment in production.
Auditing will still have a performance impact on production clusters, and file-based logging needs to be configured to be operationally safe and sustainable. Ensure appropriate metrics monitoring and alerts are in place around the log files. To learn more about this feature, read this blog post.
Authorization by datacenter
Prior to Cassandra 4.0, authorization via user or role was all or nothing with no concept of datacenters. Starting in version 4.0, Cassandra operators can permit and restrict connections from client applications to specific datacenters.
Authorization by datacenter utilizes the existing Authorization feature in Cassandra, and its concept of Roles. This only prevents the connections to coordinators in disallowed datacenters. Data from remote datacenters can still be accessed through a local coordinator. Server-side control of connections to coordinators is important to prevent saturation of network and available connections. This can lead to client applications being unable to connect to the cluster. This is a common problem with analytics datacenters where clients, like Apache Spark, can spawn hundreds or thousands of connections.
The use of datacenter authorization is recommended when there is a need to control and limit connections to coordinators in specific data centers. This is considered to be a safe feature to use immediately where there is benefit, as it touches very few components of the Cassandra codebase. Though, it’s not recommended to rely on this feature to restrict access to data only replicated to certain datacenters. To learn more about this feature, read this blog post.
Automatic SSTable upgrading
Starting in version 4.0, the cassandra.yaml configuration file contains a new setting to automatically upgrade SSTables to the latest format after upgrading Cassandra: automatic_sstable_upgrade
.
It provides an alternative to manually running the upgrade command via nodetool
This feature is optional and disabled by default.
SSTable data must be upgraded as part of any upgrade from Cassandra 3.x to 4.x, as the table format has changed.
When the automatic_sstable_upgrade
setting is enabled and set to true
, a Cassandra node will automatically start the process to convert SSTables from the 3.x format to the 4.x format.
This process begins after starting Cassandra 4.x for the first time on an upgraded node.
While this feature provides a convenience when upgrading nodes in the cluster, care needs to be taken when using it. Operators of a Cassandra cluster trade off control over the execution of the process for the convenience of never having to manually execute it.
SSTables are stored in sorted order, which should keep the CPU and disk IO usage relatively low. However, too many concurrent executions of the SSTable upgrade process may overwhelm the cluster. If this feature is enabled, upgrading a node may be delayed until the SSTable upgrade on other nodes completes.
Leaving this feature is disabled allows the SSTable format upgrade to be performed in a controlled manner. Specifically, after upgrading all nodes in the cluster to version 4.x, SSTable upgrade can be manually started on nodes individually. This allows for a scenario where the SSTable upgrade is launched based on the health of the cluster. If the cluster is under no load, multiple executions can be started. If the cluster is under load, then it can be run on each node in turn (one node at a time).
It’s generally recommended to leave this feature disabled and to upgrade the SSTables of a node manually.
Full Query Logging (FQL)
Full Query Logging (FQL) logs all requests to the CQL interface. FQL can be used for debugging, performance benchmarking, testing, and also for auditing CQL queries. In comparison to audit logging, FQL is dedicated to CQL requests only and comes with features such as the FQL Replay and FQL Compare that are unavailable in audit logging.
Full Query Logging’s primary use is for correctness testing. Use Auditing for security, business accounting, and monetary transaction compliance. Use Diagnostics Events for debugging. Use Change Data Capture (CDC) for runtime duplication/exporting/streaming of the data.
FQL only logs the queries that successfully complete.
FQL appears to have little or no overhead in WRITE only workloads, and a minor overhead in MIXED workloads.
The implementation is file (binlog) based. Different OpenHFT plug-in libraries can be used, which will influence operability and performance.
Minor performance changes are to be expected, and will depend on the setup and configuration. You should perform testing and benchmarking to evaluate production implications. File-based logging needs to be configured to be operational safe and sustainable. Ensure appropriate metrics monitoring and alerts are in place around the log files.
FQL is intended for correctness testing its use will add quality assurance to existing clusters.
Replica-aware token allocation
Starting in version 4.0, the cassandra.yaml configuration file contains a new setting to configure the replica-aware token allocation algorithm: allocate_tokens_for_local_replication_factor
.
It’s an alternative to the existing allocate_tokens_for_keyspace
setting.
The following is the configuration setting and value that was used in Cassandra 3.11.x to configure the replica-aware token allocation algorithm:
allocate_tokens_for_keyspace: dummy_rf
The replication factor of the keyspace defined in the allocate_tokens_for_keyspace
setting is used by the replica-aware token allocator when generating tokens for a new node.
No other keyspace information is used.
Providing a keyspace for this feature introduced additional overhead and complexities when using the feature in a new cluster.
Starting in version 4.0, the newly-added allocate_tokens_for_local_replication_factor
setting can be used as an alternative means of specifying the desired replication factor to the replica-aware token allocator.
It has a default value of 3
and reduces the complexities and overhead associated with using replica-aware token allocation.
Instead of keeping the allocate_tokens_for_keyspace
setting and value, it should be removed or commented out in the configuration file when upgrading to Cassandra 4.x:
# allocate_tokens_for_keyspace: dummy_rf
This change should be done as part of Update Cassandra configuration files.
Thrift protocol removed
The Thrift protocol has been removed starting in Cassandra 4.0. With the removal of the Thrift protocol, the following settings have also been removed from cassandra.yaml:
-
start_rpc
-
rpc_port
-
rpc_server_type
-
thrift_framed_transport_size_in_mb
The above settings no longer appear in the default configuration as of version 4.0, and you must never add them back in cassandra.yaml, otherwise Cassandra will fail to start.
It’s recommended that clients currently using the Thrift protocol change to using the Native Binary protocol.
Throttling client throughput
In Cassandra versions 3.0.19, 3.11.5, and 4.0, two new settings are configurable via cassandra.yaml:
-
native_transport_max_concurrent_requests_in_bytes
-
native_transport_max_concurrent_requests_in_bytes_per_ip
These are hidden settings and not listed in cassandra.yaml. They have dynamic default values:
native_transport_max_concurrent_requests_in_bytes = `heap / 10`
native_transport_max_concurrent_requests_in_bytes_per_ip = `heap / 40`
Clients that send requests over these thresholds will receive OverloadedException
error messages.
It’s recommended that you leave the default settings in place, and only tune them when needed.
These settings are exposed via the following Storage Service MBeans which allow them to be modified via JMX:
-
NativeTransportMaxConcurrentRequestsInBytes
-
NativeTransportMaxConcurrentRequestsInBytesPerIp
For more information, refer to CASSANDRA-15519.
Transient replication
Transient replication takes advantage of incremental repairs, allowing consistency levels and storage requirements to be reduced when data is marked as repaired.
Transient replication introduces new terminology for each replica. Replicas are now either full replicas or transient replicas. Full replicas, like replicas previously, store all data. Transient replicas store only unrepaired data.
When Cassandra 4.0 was released, transient replication was explicitly announced as an experimental feature. At the time of its release, transient replication couldn’t be used for:
-
Monotonic Reads
-
Lightweight Transactions (LWTs)
-
Logged Batches
-
Counters
-
Keyspaces using materialized views
-
Secondary indexes (2i)
-
Tables with CDC enabled
Transient replication introduces operational overhead when it comes to changing replica factor (RF).
-
RF cannot be altered while some endpoints are not in a normal state (no range movements).
-
You can’t add full replicas if there are any transient replicas. You must first remove all transient replicas, then change the # of full replicas, then add back the transient replicas.
-
You can only safely increase the number of transients one at a time with incremental repair run in between each time.
Transient Replication offers the potential of reducing network traffic and storage requirements, e.g.volume sizes and/or number of nodes. However, due to its experimental status, any reliance on this feature should be accompanied with committed involvement, resources, and testing.
Zero Copy Streams
Cassandra 4.0 introduces hardware-bound Zero Copy Streaming (ZCS). Streaming is a fundamental operation to many of Cassandra’s internal operations: bootstrapping, decommissioning, host replacement, repairs, rebuilds, etc. Prior to this feature, all data was streamed through the JVMs of the source and destination nodes, making it a slow and garbage collection-intensive process. With hardware-bound streaming, data can be transferred directly from disk to network avoiding the JVM.
ZCS is enabled by setting stream_entire_sstables: true
in cassandra.yaml.
Throttling still occurs according to the stream_throughput_outbound
setting.
ZCS will only be used when an SSTable contains only partitions that have been requested to be streamed.
In practice this means it only works when vnodes are disabled (i.e. num_tokens=1
).
That is, if an SSTable candidate contains partitions that are additional to those of the requested streaming operation, it will be streamed via the traditional JVM streaming implementation.
ZCS is also unable to be used for any SSTables that contain legacy counters (from Cassandra versions before 2.1).
To summarize, for ZCS to work on a particular SSTable, the following checks must pass:
-
The
stream_entire_sstables
setting istrue
. -
No counter legacy shards in SSTable.
-
The SSTable must contain data belonging to only the token ranges specified in the streaming request.
Activation and frequency of ZCS can be observed in the debug.log
file, if it is enabled.
The output will be from the CassandraEntireSSTableStreamReader
logger.
ZCS provides improved availability and elasticity by reducing the time it takes to stream token range replicas between nodes during bootstrapping, decommissioning, and host replacement operations. ZCS provides improved performance by reducing the load and latency impact streaming large amounts of data through each JVM causes.
But ZCS only works for clusters not using vnodes, i.e. where num_tokens=1
.
There are a number of other important streaming improvements introduced in version 4.0, such as Netty non-blocking NIO messaging, parallel keyspace streams, and better stream balancing during bootstrap. These improvements will improve bootstrapping times and overall elasticity of a Cassandra cluster.
To enable parallel streams, increase the value of streaming_connections_per_host
in cassandra.yaml.
This will help when streams are receiver CPU bound.
ZCS provides no benefits on vnode clusters. Since there is no negative impact, it’s generally recommended to leave ZCS enabled.
JVM options changes
The jvm.options file is replaced in Cassandra 4.x by three files: jvm-server.options, jvm8-server.options, and jvm11-server.options. These files control the behavior of the Cassandra JVM.
Another new set of files — jvm-client.options, jvm8-client.options and jvm11-client.options — have been added to configure the JVM of auxiliary tools, such as nodetool
and SSTable tools.
- jvm-server.options & jvm-client.options
-
These files contain static flags that are used by any version of the JVM. All flags defined in these files will be used by Cassandra when the JVM is started.
- jvm8-server.options & jvm8-client.options
-
These files contain flags specific to Java 8.
- jvm11-server.options & jvm11-client.options
-
These files contain flags specific to Java 11.
The version of Java used to run Cassandra determines which of the version-specific .options
files are used.
For example, if Java 11 is used to run Cassandra, then jvm11-server.options and jvm11-client.options are used.
The cassandra-env.sh file is still relevant and sourced across the .options
files.
However, it does not reflect the server-client split.
For example, placing a G1-specific flags to cassandra-env.sh, enabling G1 in jvm-server.options but keeping the default CMS in jvm-client.options will prevent the client JVMs from working correctly.
It’s recommended that you rewrite your current jvm.options file into jvm-server.options, jvm-client.options, jvm8-server.options, and jvm8-client.options when updating the Cassandra configuration files during upgrade. Validate carefully that new nodes start up with the intended JVM options.
cassandra-rackdc.properties file
Ec2Snitch supports standard AWS naming conventions
Starting in Cassandra 4.0, the cassandra-rackdc.properties file contains the new ec2_naming_scheme
setting that allows the Ec2Snitch
and Ec2MultiRegionSnitch
to switch between legacy and AWS-style naming conventions.
The possible values are legacy
and standard
(default).
In Cassandra 4.x, standard
is the default naming convention used.
To use the pre-4.x naming convention, this setting must be set to legacy
, i.e. you must use the legacy
value if you are upgrading a pre-4.0 cluster.
Prior Cassandra 4.0, the Ec2Snitch
and Ec2MultiRegionSnitch
used datacenter and rack naming conventions inconsistent with those presented in Amazon EC2 APIs.
Full AWS-style naming was introduced in Cassandra 4.0 (CASSANDRA-7839) and made the default naming convention for Ec2Snitch
and Ec2MultiRegionSnitch
.
As part of this change, the ability to control the naming convention was added.
This was done using the ec2_naming_scheme
setting in the cassandra-rackdc.properties file.
The possible values for this setting are legacy
and standard
.
If the endpoint_snitch
setting in cassandra.yaml is set to either Ec2Snitch
or Ec2MultiRegionSnitch
, then the ec2_naming_scheme
setting in the cassandra-rackdc.properties file must be set to legacy
when upgrading from Cassandra 3.x to 4.x.
legacy
-
Sets the
Ec2Snitch
andEc2MultiRegionSnitch
to use the pre-4.0 name convention which is inconsistent with those presented in Amazon EC2 APIs. Specifically, the Datacenter name is set to the part of the Region name preceding the last "-". If the Region ends in a value of "-1", it’s omitted from the Datacenter name. If the Region ends in a value other than "-1", it’s included in the Datacenter name. The Rack name is the number and letter portion of the Availability Zone name following the last "-".For example:
Region Availability Zone Datacenter Rack us-west-1
us-west-1a
us-west
1a
us-west-2
us-west-2b
us-west-2
2b
standard
-
Sets the
Ec2Snitch
andEc2MultiRegionSnitch
to use the new naming convention introduced in Cassandra 4.0. This follows the full AWS-style naming and is the default value in Cassandra 4.x.Specifically, the Datacenter name is the standard AWS Region name, including the number. The Rack name is the full Availability Zone name.
For example:
Region Availability Zone Datacenter Rack us-west-1
us-west-1a
us-west-1
us-west-1a
us-west-2
us-west-2b
us-west-2
us-west-2b
OldNetworkTopologyStrategy
removed
The deprecated OldNetworkTopologyStrategy
has been removed starting in Cassandra 4.0.
Before upgrading, you need to update all keyspaces using OldNetworkTopologyStrategy
by changing them to use NetworkTopologyStrategy
.
New CQL data types
- Positive and negative NAN and INFINITY
-
The existing NAN and INFINITY data types are replaced with both positive and negative variants.
New CQL table options
additional_write_policy
-
How transient replication writes handle the transient replica.
read_repair
(replica filtering protection)-
Introduces a configurable tradeoff between monotonic quorum reads and partition-level write atomicity.
compression_level
-
Allows defining compression level to the compressor.
-
0
: fastest -
9
: smallest
-
- Hybrid
speculative_retry
-
Allows specifying
MIN()
andMAX()
around two types ofspeculative_retry
values, providing bounds that ensure useful values even with one down or misbehaving node.
Windows support removed
Windows support is removed from Cassandra 4.0 onward.
Developers who use Windows can continue to run Apache Cassandra locally using Windows Subsystem for Linux version 2 (WSL2), Docker for Windows, and virtualization platforms like Hyper-V and VirtualBox.