cassandra.yaml
configuration file
The cassandra.yaml
file is the main configuration file for Hyper-Converged Database (HCD).
After changing properties in the |
Syntax
For the properties in each section, the parent setting has zero spaces. Each child entry requires at least two spaces. Adhere to the YAML syntax and retain the spacing.
-
Default values that are not defined are shown as Default: none.
-
Internally defined default values are described.
Default values can be defined internally, commented out, or have implementation dependencies on other properties in the
cassandra.yaml
file. Additionally, some commented-out values may not match the actual default values. The commented out values are recommended alternatives to the default values.
Organization
The configuration properties are grouped into the following sections:
-
The minimal properties needed for configuring a cluster.
-
If you have changed any of the default directories during installation, set these properties to the new locations. Make sure you have root access.
-
Properties for configuring the location of a single or multiple (JBOD) data directories.
-
Properties most frequently used when configuring HCD.
-
Tuning performance and system resource utilization, including commit log, compaction, memory, disk I/O, CPU, reads, and writes.
-
Properties for advanced users or properties that are less commonly used.
-
Configure authentication, authorization, and role management.
-
User-defined functions (UDF) properties
Configure how UDF code is executed inside Cassandra daemons.
-
Configure memory, threads, and duration when pushing pages continuously to the client.
-
Memory leak detection settings
Configure memory leak detection.
-
Enable emulation mode for testing applications meant to run on Astra DB.
-
Configure Cassandra system limits which ensure high availability and optimal performance of the database.
Quick start properties
The minimal properties needed for configuring a cluster.
cluster_name
-
The name of the cluster. This setting prevents nodes in one logical cluster from joining another so it is important to set it to a unique name other than the default. All nodes in a cluster must have the same value.
Default:
'Test Cluster'
rpc_address
-
The address that client applications connect to. This is typically set to a node’s public IP that is routable from the clients.
If not changed from the default
localhost
, only applications deployed on the server will be able to connect to the node.Default:
localhost
listen_address
-
The IP address or hostname that the database binds to, exclusively for private communication between nodes in the cluster. This is typically set to a node’s private IP that is routable from other nodes.
If not changed from the default
localhost
, the node will not be able to communicate with other nodes in the cluster.Default:
localhost
listen_interface
-
The interface that the database binds to for connecting to other nodes. Interfaces must correspond to a single address. IP aliasing is not supported.
Never set
listen_address
to0.0.0.0
.Set
listen_address
orlisten_interface
, not both.
listen_interface_prefer_ipv6
-
Use IPv4 or IPv6 when interface is specified by name.
-
false
- Use first IPv4 address. -
true
- Use first IPv6 address.When only a single address is used, that address is selected without regard to this setting.
Default:
false
-
Default directories
If you have changed any of the default directories during installation, set these properties to the new locations. Make sure you have root access.
- data_file_directories
-
The directory where table data is stored on disk. The database distributes data evenly across the location, subject to the granularity of the configured compaction strategy.
For production, DataStax recommends RAID 0 and SSDs.
Default: - /var/lib/cassandra/data
commitlog_directory
-
The directory where the commit log is stored.
For optimal write performance, place the commit log on a separate disk partition, or ideally on a separate physical device, from the data directories. Because the commit log is append only, a hard disk drive (HDD) is acceptable as long as it is fast enough to keep up with the writes.
Default:
$CASSANDRA_HOME/data/commitlog
The
commitlog_directory
and thecdc_raw_directory
must reside on the same partition. Keep these directories in separate sub-folders that are not nested.
cdc_raw_directory
-
The directory where the change data capture (CDC) commit log segments are stored on flush. DataStax recommends a physical device that is separate from the data directories. See Change Data Capture (CDC) logging.
Default:
$CASSANDRA_HOME/data/cdc_raw
The
cdc_raw_directory
and thecommitlog_directory
must reside on the same partition. Keep these directories in separate sub-folders that are not nested.
hints_directory
-
The directory where hints (missed writes) are stored.
Default:
$CASSANDRA_HOME/data/hints
metadata_directory
-
The directory that holds cluster metadata including information about the local node and its peers.
Default:
$CASSANDRA_HOME/data/metadata
saved_caches_directory
-
The directory location where table key and row caches are stored.
Default:
$CASSANDRA_HOME/data/saved_caches
Data directory configuration
Distributing data across multiple disks, also known as "JBOD configuration" (just a bunch of disks), maximizes throughput and ensures efficient disk I/O. Cassandra allows you to specify multiple directories for storing your data to achieve this.
To configure a single data directory in the cassandra.yaml
file:
data_file_directories:
- /var/lib/cassandra/data
For multiple data directories:
data_file_directories:
- /disk1/datadir
- /disk2/datadir
- /disk3/datadir
Commonly used properties
Properties most frequently used when configuring HCD.
Before starting a node for the first time, DataStax recommends that you carefully evaluate your requirements.
Common initialization properties
Be sure to set the properties in the Quick start section as well. |
commit_failure_policy
-
Determines how Cassandra handles commit log disk failures.
-
die
- Shut down the node and kill the JVM, so the node can be replaced. -
stop
- Shut down the node, leaving the node effectively dead, available for inspection using JMX. -
stop_commit
- Shut down the commit log, letting writes collect but continuing to service reads. -
ignore
- Ignore fatal errors and let the batches fail.
Default:
stop
(recommended) -
disk_optimization_strategy
-
Reading from spinning disks is slow so they are buffered with an extra page of 4KB just in case. This is unnecessary for SSDs so buffer only what is required.
-
ssd
- data directory backed by solid state disks -
spinning
- data directory backed by spinning disks
Default:
ssd
-
disk_failure_policy
-
Determines how Cassandra handles disk failures.
-
die
- Shut down gossip and client transports, and kill the JVM for any file system errors or single SSTable errors, so the node can be replaced. -
stop_paranoid
- Shut down the node, even for single SSTable errors. -
stop
- Shut down the node leaving the node effectively dead, but the JVM is still available for inspection using JMX. -
best_effort
- Stop using the failed disk and respond to requests based on the remaining available SSTables. This setting allows obsolete data at consistency level ofONE
. -
ignore
- Ignore fatal errors and lets the requests fail; all file system errors are logged but otherwise ignored.
Recommended policies are
stop
andbest_effort
.+ Default:
stop
-
endpoint_snitch
-
Configure this property to set the snitch. The most common snitches are:
-
SimpleSnitch
(default) + Uses replication strategy order for proximity. This snitch does not recognize racks or data centers, and considers all nodes as belonging to one ring (single DC) making it incompatible with multi-DC deployments and unsuitable for production environments. + This snitch is appropriate for development environments only. -
GossipingPropertyFileSnitch
(GPFS) + Uses rack and datacenter information for the local node defined in thecassandra-rackdc.properties
file and propagates this information to other nodes via gossip. + This snitch is recommended for production environments and is almost always the correct choice. -
PropertyFileSnitch
(PFS) + Determines node proximity using the rack and datacenter location defined in thecassandra-topology.properties
file. This snitch has been superceded by GPFS and is only provided for backwards compatibility.For other snitches such as
Ec2Snitch
andGoogleCloudSnitch
, see About snitches.
All nodes in a cluster must use the same snitch.
Replica placement (defines where copies of data is stored) is determined using the information provided by the snitch. Changing the snitch has implications for where the data is located so requires additional steps and should only be performed by experienced operators.
-
seed_provider
-
The gossip seed provider and corresponding addresses of nodes that are designated as contact points in the cluster. A joining node contacts the nodes in the
seeds
list and establishes a connection to the first available node to discover the members of the cluster and topology.-
class_name
- The class that handles the seed logic. The default is used in almost all clusters however a custom seed provider can be substituted in limited edge cases. + Default:org.apache.cassandra.locator.SimpleSeedProvider
-
seeds
- A comma delimited list of addresses and their correspondingstorage_port
. A new node joining the cluster uses the list to bootstrap the gossip process. If the cluster has multiple nodes, the default value must be changed to the IP address (and gossip port) of one of the nodes.Default:
"127.0.0.1:7000"
Making every node a seed node is not recommended because of increased maintenance and reduced gossip performance. Gossip optimization is not critical, but it is recommended to use a small seed list (approximately three nodes per datacenter).
-
Advanced initialization properties
allocate_tokens_for_keyspace
-
Triggers the algorithm which allocates optimum
num_tokens
tokens such that token ranges are spread evenly across nodes meaning data is distributed more evenly compared to the legacy random allocation. Only supported on clusters usingMurmur3Partitioner
.The replication strategy of the specified keyspace is used by the algorithm for optimizing token allocation when new nodes join a cluster.
The property
allocate_tokens_for_local_replication_factor
is preferred overallocate_tokens_for_keyspace
particularly when adding nodes in a new data center where a keyspace is not yet replicated. If neither property is set, defaults to legacy behaviour where tokens are allocated randomly.
allocate_tokens_for_local_replication_factor
-
Triggers the algorithm which allocates optimum
num_tokens
tokens such that token ranges are spread evenly across nodes meaning data is distributed more evenly compared to the legacy random allocation. Only supported on clusters usingMurmur3Partitioner
.Specify the replication factor in the local data center (
3
for example) that the algorithm will use to optimize token allocation when new nodes join a cluster.This property is preferred over
allocate_tokens_for_keyspace
since it does not require the replication of a keyspace to be defined particularly when adding nodes in a new data center. If neither property is set, defaults to legacy behaviour where tokens are allocated randomly.
auto_bootstrap
-
When joining a cluster for the first time, this property determines whether the node will request replicas to stream data (default behaviour). If the node is defined as a seed, it immediately joins the cluster without data.
Non-seed nodes will bootstrap automatically by default. Set to
false
when adding nodes in a new data center where bootstrap is manually triggered by an operator with thenodetool rebuild
command.Default:
true
broadcast_address
-
Set to the node’s public IP address in environments where nodes are only able to communicate across networks using their public IP adresses such as multi-region Amazon EC2 deployments. Otherwise, the node will broadcast on the same address as
listen_address
.Set a separate
listen_address
andbroadcast_address
on a node with multiple network interfaces or where nodes are not able to communicate over private IP addresses. Not required in environments that support automatic switching between private and public communication.Default: uses value of
listen_address
initial_token
-
The property for manually assigning token(s) for range(s) to be owned by the node.
Specify one token value for legacy single-token clusters. For clusters with virtual nodes enabled, specify multiple tokens as a comma-separated list.
When setting
initial_token
, the correspondingnum_tokens
must also be set.Default: not set in preference for
num_tokens
listen_on_broadcast_address
-
Set to
true
on nodes with multiple interfaces to enable communication on bothlisten_address
andbroadcast_address
.Default:
false
num_tokens
-
Defines the number of tokens to assign to the node.
Early versions of Cassandra used a default value of
256
tokens for clusters with virtual nodes enabled so data is shared with more peers and least variance in data size among nodes in the same data center but leads to decreased availability in the event of node outages.Lesser token counts such as
4
or8
have a higher availability but also higher variance in data size.16
tokens achieves a good distribution of data without compromising too much on availability.Default:
1
token when not set
partitioner
-
The partitioner determines how data is distributed across the nodes in the cluster.
The default
Murmur3Partitioner
is the correct and only choice for new clusters. The legacy partitioners are provided for backward-compatibility with existing clusters upgraded from older versions of Cassandra since the partitioner can never be changed on a running cluster.Default:
org.apache.cassandra.dht.Murmur3Partitioner
Common compaction settings
compaction_throughput_mb_per_sec
-
The rate (in megabytes/second) at which SSTable candidates will be compacted. The faster the database inserts data, the faster the system must compact in order to keep the number of SSTables down.
Set to 16 to 32 times the write throughput in MB/second. Otherwise, set to
0
to disable compaction throttling. A high setting means that more of the disk I/O is used for compaction, leaving less I/O bandwidth for reads.Default:
64
See Configure compaction.
Memtable settings
When a node receives a write request, the data is stored in a memory structure called memtable as well as appended to the commit log on disk for durability (see How data is written). The memtable segments can be allocated either on- or off-heap.
memtable_allocation_type
-
Determines how HCD allocates memory to the memtable.
-
heap_buffers
- Memtables are allocated on JVM heap. Suitable for general workloads where heap memory is sufficient. -
offheap_buffers
- Uses Java NIO direct buffers to store cell names and values off-heap. This allocation type reduces heap utilization significantly, leading to reduced GC pressure. -
offheap_objects
- Allocates memtables completely off-heap, directly in native memory. This allocation type is recommended particularly for clusters that handle large datasets. Writes are around 5% faster mostly due to memtables flushing less often. -
unslabbed_heap_buffers
- Allocates memtables on JVM heap without using a slab allocator. This can lead to increased heap fragmentation so is not recommended for any environments.
Default:
offheap_objects
-
memtable_heap_space_in_mb
-
The maximum amount of memory to allocate for memtables on JVM heap. When the threshold is reached, writes are blocked until a flush completes. A flush of the largest memtable is triggered based on
memtable_cleanup_threshold
.Default: ¼ of heap
memtable_offheap_space_in_mb
-
The maximum amount of memory to allocate for memtables from native memory. When the threshold is reached, writes are blocked until a flush completes. A flush of the largest memtable is triggered based on
memtable_cleanup_threshold
.Default: ¼ of heap
memtable_cleanup_threshold
-
The threshold that triggers a flush based on the ratio of memtable size to the maximum memory size permitted for memtables.
Setting a value is deprecated since the default calculation is the only reasonable choice.
Default:
1 / (memtable_flush_writers + 1)
memtable_flush_writers
-
The total number of memtables that can be flushed concurrently as well as the number of flush writer threads per disk.
A single thread is generally capable of keeping up with ingesting writes on a node with a single fast disk unless it becomes IO-bound temporarily so two flush writers are usually sufficient. If flushing is falling behind (
MemtablePool.BlockedOnAllocation
metric is greater than 0), increment the number of flush writers.Note that more writers can lead to more frequent flushes and smaller SSTables which puts pressure on compactions.
Default:
2
for nodes with a single data directory, otherwise1
per memtable
Common automatic backup settings
Backups and snapshots are not automatically cleared so disk usage can grow unbounded. When the disk gets full, HCD will automatically shut down by default when it can no longer write files to disk. DataStax recommends setting up a process to clear incremental backups each time a new snapshot is created. |
auto_snapshot
-
When enabled (set to
true
), a snapshot is taken beforeDROP KEYPACE
,DROP TABLE
, orTRUNCATE TABLE
is executed.DataStax strongly recommends keeping this enabled as a precaution in case the
DROP
orTRUNCATE
is executed accidentally against the wrong keyspace or table.Default:
true
incremental_backups
-
When enabled (set to
true
), create hard links to each SSTable that has been flushed or streamed in thebackups/
subdirectory of the keyspace data.Default:
false
snapshot_before_compaction
-
When enabled (set to
true
), a snapshot is taken before each compaction task. The snapshot may be used as a rollback position in an upgrade. Usage is limited since the general recommendation is to take backups before performing an upgrade.Use with extreme caution as disk usage can grow exponentially.
Default:
false
Performance tuning
Tuning performance and system resource utilization, including commit log, compaction, memory, disk I/O, CPU, reads, and writes.
Performing tuning properties include:
Commit log settings
commitlog_sync
-
Defines the mode by which the commit log is synchronized to disk, in other words when the data is considered fully persisted to storage (will survive a system crash or power outage). The sync mode also determines when a successful write acknowledgement is sent to the coordinator.
-
batch
- Each write request triggers a call to sync immediately. The acknowledgement is blocked until the after the commit log has been flushed to disk. Prioritizes durability over performance. -
group
- Similar tobatch
mode but waits up tocommitlog_sync_group_window_in_ms
between flushes so more writes are persisted together. The acknowledgement is also blocked until the after the commit log has been flushed to disk. Recommended overbatch
mode. -
periodic
- Commit log is synced everycommitlog_sync_period_in_ms
but the write is acknowledged immediately. Prioritizes performance over durability.
Default:
periodic
-
commitlog_sync_period_in_ms
-
Time interval between commit log syncs to disk. Only set with
periodic
sync mode, otherwise an exception will be logged.Default:
10000
(10 seconds) commitlog_sync_group_window_in_ms
-
The minimum interval between disk syncs. Only set with
group
sync mode, otherwise an exception will be logged.Default:
1000
(1 second) commitlog_sync_batch_window_in_ms
-
Deprecated. The maximum delay between disk syncs. No longer used.
commitlog_segment_size_in_mb
-
The size of individual commit log file segments. A small size means more frequent flushes leading to small SSTables which put pressure on compaction.
If using commit log archives for point-in-time recovery, reducing the size to 16 or 8MB for finer granularity is reasonable but be aware that the maximum mutation size is dependent on the segment size (see below).
Default:
32
max_mutation_size_in_kb
-
The maximum allowed size of a mutation (the payload size of a write request) which defaults to half of the commit log segment (
commitlog_segment_size_in_mb
). If explicitly set, you must setcommitlog_segment_size_in_mb
to at least twice the value ofmax_mutation_size_in_kb
.Before increasing the commitlog segment size of the commitlog segments, investigate why the mutations are larger than expected. Look for underlying issues with access patterns and data model, because increasing the commitlog segment size is a limited fix.
Default: ½ of
commitlog_segment_size_in_mb
commitlog_total_space_in_mb
-
The maximum disk space for commit logs on disk.
If the limit is reached, the oldest commit log segments get flushed to reclaim disk space. A small size means more frequent flushes on less-active tables leading to small SSTables which put pressure on compaction.
Default: smaller of
8192
or ¼ ofcommitlog/
disk commitlog_compression
-
By default, the commit log is not compressed. To enable compression, specify the compression library to use.
The supported libraries are:
-
DeflateCompressor
- Legacy option that is the slowest compared to newer algorithms so is not recommended. -
LZ4Compressor
- Fastest algorithm but offers less compression ratios. Choose when speed is preferred over space savings. -
SnappyCompressor
- Not as fast as LZ4 but provides better compression. -
ZstdCompressor
- Provides the best compression ratio but slower than other algorithms.commitlog_compression: - class_name: LZ4Compressor
-
Change Data Capture (CDC) settings
See also cdc_raw_directory
.
cdc_enabled
-
Enables CDC functionality on a per-node basis when set to
true
.Default:
false
cdc_total_space_in_mb
-
Maximum disk space to use for CDC logs. If the limit is reached, HCD throws
WriteTimeoutException
on mutations, including CDC-enabled tables. ACDCCompactor
(a consumer) is responsible for parsing the raw CDC logs and deleting them when parsing is completed.Default: smaller of
4096
MB or 1/8th ofcdc_raw_directory
disk cdc_free_space_check_interval_ms
-
Interval between disk space checks when
cdc_total_space_in_mb
limit is reached.Default:
250
Compaction settings
See also |
concurrent_compactors
-
The number of compaction threads allowed to run simultaneously. Simultaneous compactions help preserve read performance in a mixed read-write workload by limiting the number of small SSTables that accumulate during a single long-running compaction.
Generally, the calculated default value is appropriate and does not need adjusting. DataStax recommends contacting the DataStax Services team before changing this value. If your data directories are backed by SSDs, increase this value to the number of cores.
If compaction running too slowly or too fast, adjust
compaction_throughput_mb_per_sec
first.Increasing concurrent compactors leads to more use of available disk space for compaction, because concurrent compactions happen in parallel, especially for STCS. Ensure that adequate disk space is available before increasing this configuration.
Default: fewer of number of data disks or CPU cores, with a minimum of
2
and a maximum of8
concurrent_validations
-
The number of repair validation threads allowed to run simultaneously.
Defaults to the value of
concurrent_compactors
if not configured or set to⇐ 0
. Requires system property-Dcassandra.allow_unlimited_concurrent_validations=true
to set validation threads to a value higher than concurrent compactors.Default: the value of
concurrent_compactors
concurrent_materialized_view_builders
-
The number of view builder tasks allowed to run simultaneously if materialized views are enabled (experimental).
When a view is created, the node ranges are split into [num_processors x 4] builder tasks. Set this property to
2
or higher to build views faster.Default:
1
sstable_preemptive_open_interval_in_mb
-
The size of the SSTable candidates to trigger preemptive opening of compaction output.
The compaction process opens SSTables before they are completely written and uses them in place of the prior SSTables for any range previously written. Preemptive opening of SSTables helps to smoothly transfer reads between the SSTables by reducing cache churn and keeps hot rows hot.
A low value has a negative performance impact and will eventually cause heap pressure and GC activity. The optimal value depends on hardware and workload.
Default:
50
Cache and index settings
column_index_size_in_kb
-
Granularity of the index of rows within a partition. For huge rows, decrease this setting to improve seek time.
Default:
64
file_cache_size_in_mb
-
Maximum memory to use for caching SSTable chunks and buffer pools. Allocated from native memory in addition to heap.
Default: smaller of
2048
or ¼ of heap
Streaming settings
These settings apply to operations which peprform file streaming including repairs, bootstraps and decommissions. These operations are mostly sequential I/O which can saturate a node’s network bandwidth and degrade client (application) performance so it is important to throttle streaming throughput.
inter_dc_stream_throughput_outbound_megabits_per_sec
-
Maximum network bandwidth for file transfers (streaming) between data centers. Set to a value less or equal to
stream_throughput_outbound_megabits_per_sec
.Default:
200
Mbps (25 MB/s)
stream_entire_sstables
-
Enables the Zero Copy Streaming feature where eligible SSTables are streamed in their entirety between nodes instead of individual partitions, transferring data at a significantly faster rate.
This feature is bound to the streaming throughput limits and disabled when internode encryption is enabled.
Default:
true
stream_throughput_outbound_megabits_per_sec
-
Maximum network bandwidth permitted for all outbound file transfers (streaming) on a node.
Default:
200
Mbps (25 MB/s)
streaming_keep_alive_period_in_secs
-
Interval to send keep-alive messages to prevent reset connections during streaming. The streaming session fails when a keep-alive message is not received for two keep-alive cycles equivalent to 10 minutes by default (2 x
300
seconds).Default:
300
Advanced properties
Less commonly-used settings normally reserved for experienced operators.
max_value_size_in_mb
-
The maximum size of any value in SSTables up to a maximum of 2048 MB. If any value exceeds this threshold, the SSTables are marked as corrupted.
Default size is the same as the default native protocol frame limit
native_transport_max_frame_size_in_mb
.Default:
256
trickle_fsync
-
Enables flushing portions of SSTables written using sequential writers when
trickle_fsync_interval_in_kb
is reached. Minimizes sudden flushing of dirty buffers which can impact read latencies.Recommended for use with SSDs which can handle more frequent calls to
fsync()
, but may be detrimental to slow HDDs.Default:
true
trickle_fsync_interval_in_kb
-
Threshold to trigger a flush when
trickle_fsync
is enabled.Default:
10240
(10 MB)