cassandra.yaml configuration file
The cassandra.yaml
file is the main configuration file for Hyper-Converged Database (HCD).
After changing properties in the |
Syntax
For the properties in each section, the parent setting has zero spaces. Each child entry requires at least two spaces. Adhere to the YAML syntax and retain the spacing.
-
The system shows undefined default values as
Default
:none
. -
The system describes internally defined default values.
HCD can define default values internally, comment them out, or create implementation dependencies on other properties in the
cassandra.yaml
file. Additionally, some commented-out values may not match the actual default values. DataStax recommends the commented out values as alternatives to the default values.
Organization
The cassandra.yaml
file groups configuration properties into the following sections:
-
The minimal properties needed for configuring a cluster.
-
If you have changed any of the default directories during installation, set these properties to the new locations. Make sure you have root access.
-
Properties for configuring the location of a single or multiple JBOD data directories.
-
Properties most frequently used when configuring HCD.
-
Tuning performance and system resource utilization, including commit log, compaction, memory, disk I/O, CPU, reads, and writes.
-
Properties for advanced users or properties that are less commonly used.
-
Configure authentication, authorization, and role management.
-
User-defined functions (UDF) properties
Configure how UDF code is executed inside Cassandra daemons.
-
Configure memory, threads, and duration when pushing pages continuously to the client.
-
Memory leak detection settings
Configure memory leak detection.
-
Enable emulation mode for testing applications meant to run on Astra DB.
-
Configure HCD system limits to ensure high availability and optimal database performance.
Quick start properties
The minimal properties needed for configuring a cluster.
cluster_name
-
The name of the cluster. This setting prevents nodes in one logical cluster from joining another, so you must set it to a unique name other than the default. All nodes in a cluster must have the same value.
Default:
'Test Cluster'
rpc_address
-
The address that client applications connect to. You typically set this to a node’s public IP address that is routable from the clients.
If not changed from the default
localhost
, only applications deployed on the server will be able to connect to the node.Default:
localhost
listen_address
-
The IP address or hostname that the database binds to, exclusively for private communication between nodes in the cluster. You typically set this to a node’s private IP address that is routable from other nodes.
If not changed from the default
localhost
, the node will not be able to communicate with other nodes in the cluster.Default:
localhost
listen_interface
-
The interface that the database binds to for connecting to other nodes. Interfaces must correspond to a single address. HCD does not support IP aliasing.
Never set
listen_address
to0.0.0.0
.Set
listen_address
orlisten_interface
, not both.
listen_interface_prefer_ipv6
-
Use IPv4 or IPv6 when an interface is specified by name.
-
false
: Use the first IPv4 address. -
true
: Use the first IPv6 address.When you use only a single address, HCD selects that address without regard to this setting.
Default:
false
-
Default directories
If you have changed any of the default directories during installation, set these properties to the new locations. Make sure you have root access.
- data_file_directories
-
The directory where table data is stored on disk. The database distributes data evenly across the location, subject to the granularity of the configured compaction strategy.
For production, DataStax recommends RAID 0 and SSDs.
Default: - /var/lib/cassandra/data
commitlog_directory
-
The directory where HCD stores the commit log.
For optimal write performance, place the commit log on a separate disk partition, or ideally on a separate physical device, from the data directories. Because the commit log only appends data, a hard disk drive (HDD) works as long as it keeps up with the writes.
Default:
$CASSANDRA_HOME/data/commitlog
The
commitlog_directory
and thecdc_raw_directory
must reside on the same partition. Keep these directories in separate sub-folders that are not nested.
cdc_raw_directory
-
The directory where HCD stores change data capture (CDC) commit log segments on flush. DataStax recommends using a physical device that is separate from the data directories. See Change Data Capture (CDC) logging.
Default:
$CASSANDRA_HOME/data/cdc_raw
The
cdc_raw_directory
and thecommitlog_directory
must reside on the same partition. Keep these directories in separate sub-folders that are not nested.
hints_directory
-
The directory where HCD stores hints (missed writes).
Default:
$CASSANDRA_HOME/data/hints
metadata_directory
-
The directory that holds cluster metadata including information about the local node and its peers.
Default:
$CASSANDRA_HOME/data/metadata
saved_caches_directory
-
The directory location where HCD stores table key and row caches.
Default:
$CASSANDRA_HOME/data/saved_caches
Data directory configuration
Distributing data across multiple disks, also known as "JBOD configuration" (just a bunch of disks), maximizes throughput and ensures efficient disk I/O. HCD lets you specify multiple directories for storing your data in this distributed manner. DataStax recommends using striped LVM instead.
To configure a single data directory in the cassandra.yaml
file:
data_file_directories:
- /var/lib/cassandra/data
For multiple data directories:
data_file_directories:
- /disk1/datadir
- /disk2/datadir
- /disk3/datadir
Commonly used properties
Properties most frequently used when configuring HCD.
Before starting a node for the first time, DataStax recommends that you carefully evaluate your requirements.
Common initialization properties
Be sure to set the properties in the Quick start section as well. |
commit_failure_policy
-
Determines how HCD handles commit log disk failures.
-
die
: Shut down the node and kill the JVM, so the node can be replaced. -
stop
: Shut down the node, leaving the node effectively dead, available for inspection using JMX. -
stop_commit
: Shut down the commit log, let writes collect, but continue to service reads. -
ignore
: Ignore fatal errors and let the batches fail.
Default:
stop
(recommended) -
disk_optimization_strategy
-
Reading from spinning disks is slow so the system buffers them with an extra page of 4KB just in case. This is unnecessary for SSDs so the system buffers only what is required.
-
ssd
: Data directory backed by solid state disks -
spinning
: Data directory backed by spinning disks
Default:
ssd
-
disk_failure_policy
-
Determines how HCD handles disk failures.
-
die
: Shut down gossip and client transports, and kill the JVM for any file system errors or single SSTable errors. Enables you to replace the node. -
stop_paranoid
: Shut down the node, even for single SSTable errors. -
stop
: Shut down the node leaving the node effectively dead, but the JVM is still available for inspection using JMX. -
best_effort
: Stop using the failed disk and respond to requests based on the remaining available SSTables. This setting allows obsolete data at consistency level ofONE
. -
ignore
: Ignore fatal errors and let the requests fail; all file system errors are logged but otherwise ignored.
Recommended policies are
stop
andbest_effort
.+ Default:
stop
-
endpoint_snitch
-
Configure this property to set the snitch. The most common snitches are:
-
Default:
SimpleSnitch
+ Uses replication strategy order for proximity. This snitch does not recognize racks or datacenters, and considers all nodes as belonging to one ring (single DC) making it incompatible with multi-DC deployments and unsuitable for production environments. + This snitch is appropriate for development environments only. -
GossipingPropertyFileSnitch
(GPFS) + Uses rack and datacenter information for the local node defined in thecassandra-rackdc.properties
file and propagates this information to other nodes via gossip. + This snitch is recommended for production environments and is almost always the correct choice. -
PropertyFileSnitch
(PFS) + Determines node proximity using the rack and datacenter location defined in thecassandra-topology.properties
file. GPFS supersedes this snitch. Only use PFS for backwards compatibility.For other snitches such as
Ec2Snitch
andGoogleCloudSnitch
, see About snitches.
All nodes in a cluster must use the same snitch.
HCD determines replica placement (which defines where copies of data are stored) using the information provided by the snitch. Changing the snitch has implications for where the data is located so requires additional steps and should only be performed by experienced operators.
-
seed_provider
-
The gossip seed provider and corresponding addresses of nodes that are designated as contact points in the cluster. A joining node contacts the nodes in the
seeds
list and establishes a connection to the first available node to discover the members of the cluster and topology.-
class_name
: The class that handles the seed logic. HCD uses the default in almost all clusters. However, you can substitute a custom seed provider in limited edge cases. + Default:org.apache.cassandra.locator.SimpleSeedProvider
-
seeds
: A comma delimited list of addresses and their correspondingstorage_port
. A new node joining the cluster uses the list to bootstrap the gossip process. If the cluster has multiple nodes, the default value must be changed to the IP address and gossip port of one of the nodes.Default:
"127.0.0.1:7000"
Making every node a seed node is not recommended because of increased maintenance and reduced gossip performance. Gossip optimization is not critical, but it is recommended to use a small seed list (approximately three nodes per datacenter).
-
Advanced initialization properties
allocate_tokens_for_keyspace
-
Triggers the algorithm which allocates optimum
num_tokens
tokens such that token ranges are spread evenly across nodes meaning data is distributed more evenly compared to the legacy random allocation. Only supported on clusters usingMurmur3Partitioner
.The replication strategy of the specified keyspace is used by the algorithm for optimizing token allocation when new nodes join a cluster.
The property
allocate_tokens_for_local_replication_factor
is preferred overallocate_tokens_for_keyspace
, particularly when adding nodes in a new datacenter where a keyspace is not yet replicated. If neither property is set, defaults to legacy behaviour where tokens are allocated randomly.
allocate_tokens_for_local_replication_factor
-
Triggers the algorithm which allocates optimum
num_tokens
tokens such that token ranges are spread evenly across nodes meaning data is distributed more evenly compared to the legacy random allocation. Only supported on clusters usingMurmur3Partitioner
.Specify the replication factor in the local datacenter,
3
for example, that the algorithm uses to optimize token allocation when new nodes join a cluster.allocate_tokens_for_local_replication_factor
is preferred overallocate_tokens_for_keyspace
because it does not require the replication of a keyspace to be defined. This is especially helpful when adding nodes in a new datacenter. If neither property is set, defaults to legacy behaviour where tokens are allocated randomly.
auto_bootstrap
-
When joining a cluster for the first time, this property determines whether the node will request replicas to stream data. This is the default behavior. If the node is defined as a seed, it immediately joins the cluster without data.
Non-seed nodes will bootstrap automatically by default. Set to
false
when adding nodes in a new datacenter where bootstrap is manually triggered by an operator with thenodetool rebuild
command.Default:
true
broadcast_address
-
Set to the node’s public IP address in environments where nodes are only able to communicate across networks using their public IP adresses such as multi-region Amazon EC2 deployments. Otherwise, the node will broadcast on the same address as
listen_address
.Set a separate
listen_address
andbroadcast_address
on a node with multiple network interfaces or where nodes are not able to communicate over private IP addresses. Not required in environments that support automatic switching between private and public communication.Default: uses value of
listen_address
initial_token
-
The property for manually assigning tokens for ranges to be owned by the node.
Specify one token value for legacy single-token clusters. For clusters with virtual nodes enabled, specify multiple tokens as a comma-separated list.
When setting
initial_token
, the correspondingnum_tokens
must also be set.Default: not set in preference for
num_tokens
listen_on_broadcast_address
-
Set to
true
on nodes with multiple interfaces to enable communication on bothlisten_address
andbroadcast_address
.Default:
false
num_tokens
-
Defines the number of tokens to assign to the node.
Early versions of Cassandra used a default value of
256
tokens for clusters with virtual nodes enabled. This setting shares data with more peers and offers the least variance in data size among nodes in the same datacenter, but might lead to decreased availability in the event of node outages.Lesser token counts such as
4
or8
have a higher availability but also higher variance in data size.16
tokens achieves a good distribution of data without compromising too much on availability.Default:
1
token when not set
partitioner
-
The partitioner determines how data is distributed across the nodes in the cluster.
The default
Murmur3Partitioner
is the correct and only choice for new clusters. The legacy partitioners provide backward-compatibility with existing clusters upgraded from older versions of Cassandra, because the partitioner can never be changed on a running cluster.Default:
org.apache.cassandra.dht.Murmur3Partitioner
Common compaction settings
compaction_throughput_mb_per_sec
-
The rate in megabytes/second at which HCD compacts SSTable candidates. The faster the database inserts data, the faster HCD must compact in order to keep the number of SSTables down.
Set to 16 to 32 times the write throughput in MB/second. Otherwise, set to
0
to disable compaction throttling. A high setting means that HCD uses more disk I/O for compaction, leaving less I/O bandwidth for reads.Default:
64
See Configure compaction.
Memtable settings
When a node receives a write request, HCD stores the data in a memory structure called a memtable and appends it to the commit log on disk for durability (see How data is written). HCD can allocate memtable segments either on- or off-heap.
memtable_allocation_type
-
Determines how HCD allocates memory to the memtable.
-
heap_buffers
: The system allocates memtables on JVM heap. Suitable for general workloads where heap memory is sufficient. -
offheap_buffers
: Uses Java NIO direct buffers to store cell names and values off-heap. This allocation type reduces heap utilization significantly, leading to reduced GC pressure. -
offheap_objects
: The system allocates memtables completely off-heap, directly in native memory. This allocation type is recommended particularly for clusters that handle large datasets. Writes are around 5% faster mostly due to memtables flushing less often. -
unslabbed_heap_buffers
: The system allocates memtables on a JVM heap without using a slab allocator. This can lead to increased heap fragmentation. DataStax does not recommend this option for any environments.
Default:
offheap_objects
-
memtable_heap_space_in_mb
-
The maximum amount of memory to allocate for memtables on JVM heap. When the threshold is reached, the system blocks writes until a flush completes. The system triggers a flush of the largest memtable based on
memtable_cleanup_threshold
.Default: ¼ of heap
memtable_offheap_space_in_mb
-
The maximum amount of memory to allocate for memtables from native memory. When the threshold is reached, HCD blocks writes until a flush completes. HCD triggers a flush of the largest memtable based on
memtable_cleanup_threshold
.Default: ¼ of heap
memtable_cleanup_threshold
-
The threshold that triggers a flush based on the ratio of memtable size to the maximum memory size permitted for memtables.
The system deprecates setting a value since the default calculation is the only reasonable choice.
Default:
1 / (memtable_flush_writers + 1)
memtable_flush_writers
-
The total number of memtables that can be flushed concurrently as well as the number of flush writer threads per disk.
A single thread is generally capable of keeping up with ingesting writes on a node with a single fast disk unless it becomes IO-bound temporarily so two flush writers are usually sufficient. If flushing is falling behind (
MemtablePool.BlockedOnAllocation
metric is greater than 0), increment the number of flush writers.Note that more writers can lead to more frequent flushes and smaller SSTables which puts pressure on compactions.
Default:
2
for nodes with a single data directory, otherwise1
per memtable
Common automatic backup settings
HCD does not automatically clear backups and snapshots so that disk usage can grow unbounded. When the disk gets full, HCD automatically shuts down by default when it can no longer write files to disk. DataStax recommends setting up a process to clear incremental backups each time a new snapshot is created. |
auto_snapshot
-
When enabled (set to
true
), a snapshot is taken beforeDROP KEYPACE
,DROP TABLE
, orTRUNCATE TABLE
is executed.DataStax strongly recommends enabling auto snapshot as a precaution, in case someone executes the
DROP
orTRUNCATE
commands accidentally against the wrong keyspace or table.Default:
true
incremental_backups
-
When set to
true
, HCD creates hard links to each SSTable that has been flushed or streamed in thebackups/
subdirectory of the keyspace data.Default:
false
snapshot_before_compaction
-
When set to
true
, HCD takes a snapshot before each compaction task. You may use the snapshot as a rollback position in an upgrade. Usage is limited since the general recommendation is to take backups before performing an upgrade.Use with extreme caution as disk usage can grow exponentially.
Default:
false
Performance tuning
Tuning performance and system resource utilization, including commit log, compaction, memory, disk I/O, CPU, reads, and writes.
Performing tuning properties include:
Commit log settings
commitlog_sync
-
Defines the mode by which the commit log is synchronized to disk. When the data is considered fully persisted to storage, the data will survive a system crash or power outage. The sync mode also determines when HCD sends a successful write acknowledgement to the coordinator.
-
batch
: Each write request triggers a call to sync immediately. The acknowledgement is blocked until the after the commit log has been flushed to disk. Prioritizes durability over performance. -
group
: Similar tobatch
mode but waits up tocommitlog_sync_group_window_in_ms
between flushes so more writes are persisted together. The system also blocks the acknowledgement until after the commit log has been flushed to disk. Recommended overbatch
mode. -
periodic
: The system synchronizes the commit log everycommitlog_sync_period_in_ms
, but the write is acknowledged immediately. Prioritizes performance over durability.
Default:
periodic
-
commitlog_sync_period_in_ms
-
Time interval between commit log syncs to disk. Only set with
periodic
sync mode, otherwise an exception will be logged.Default:
10000
(10 seconds) commitlog_sync_group_window_in_ms
-
The minimum interval between disk syncs. Only set with
group
sync mode, otherwise an exception will be logged.Default:
1000
(1 second) commitlog_sync_batch_window_in_ms
-
Deprecated. The maximum delay between disk syncs. No longer used.
commitlog_segment_size_in_mb
-
The size of individual commit log file segments. A small size means more frequent flushes leading to small SSTables which put pressure on compaction.
If you use the commit log archives for point-in-time recovery, it is reasonable to reduce the size to 16 or 8MB for finer granularity. However, be aware that the maximum mutation size is dependent on the segment size.
Default:
32
max_mutation_size_in_kb
-
The maximum allowed size of a mutation (the payload size of a write request) which defaults to half of the commit log segment (
commitlog_segment_size_in_mb
). If explicitly set, you must setcommitlog_segment_size_in_mb
to at least twice the value ofmax_mutation_size_in_kb
.Before increasing the commitlog segment size of the commitlog segments, investigate why the mutations are larger than expected. Look for underlying issues with access patterns and data model, because increasing the commitlog segment size is a limited fix.
Default: ½ of
commitlog_segment_size_in_mb
commitlog_total_space_in_mb
-
The maximum disk space for commit logs on disk.
If the limit is reached, HCD flushes the oldest commit log segments to reclaim disk space. A small size means more frequent flushes on less-active tables leading to small SSTables which put pressure on compaction.
Default: smaller of
8192
or ¼ ofcommitlog/
disk commitlog_compression
-
By default, the commit log is not compressed. To enable compression, specify the compression library to use.
The supported libraries are:
-
DeflateCompressor
: Legacy option that is the slowest compared to newer algorithms. This option is not recommended. -
LZ4Compressor
: Fastest algorithm, but offers less compression ratios. Choose when speed is preferred over space savings. -
SnappyCompressor
: Not as fast as LZ4, but provides better compression. -
ZstdCompressor
: Provides the best compression ratio, but is slower than other algorithms.commitlog_compression: - class_name: LZ4Compressor
-
Change Data Capture (CDC) settings
See also cdc_raw_directory
.
cdc_enabled
-
Enables CDC functionality on a per-node basis when set to
true
.Default:
false
cdc_total_space_in_mb
-
Maximum disk space to use for CDC logs. If the limit is reached, HCD throws
WriteTimeoutException
on mutations, including CDC-enabled tables. ACDCCompactor
(a consumer) parses the raw CDC logs and deletes them when parsing is completed.Default: smaller of
4096
MB or 1/8th ofcdc_raw_directory
disk cdc_free_space_check_interval_ms
-
Interval between disk space checks when
cdc_total_space_in_mb
limit is reached.Default:
250
Compaction settings
See also |
concurrent_compactors
-
The number of compaction threads allowed to run simultaneously. Simultaneous compactions help preserve read performance in a mixed read-write workload by limiting the number of small SSTables that accumulate during a single long-running compaction.
Generally, the calculated default value is appropriate and does not need adjusting. DataStax recommends contacting DataStax Support before changing this value. If your data directories are backed by SSDs, increase this value to the number of cores.
If compaction runs too slowly or too fast, adjust the
compaction_throughput_mb_per_sec
option in the Common compaction settings section.Increasing concurrent compactors leads to more use of available disk space for compaction, because concurrent compactions happen in parallel, especially for STCS. Ensure that adequate disk space is available before increasing this configuration.
Default: fewer of number of data disks or CPU cores, with a minimum of
2
and a maximum of8
concurrent_validations
-
The number of repair validation threads allowed to run simultaneously.
Defaults to the value of
concurrent_compactors
if not configured or set to⇐ 0
. Requires system property-Dcassandra.allow_unlimited_concurrent_validations=true
to set validation threads to a value higher than concurrent compactors.Default: the value of
concurrent_compactors
concurrent_materialized_view_builders
-
The number of view builder tasks allowed to run simultaneously if materialized views are enabled. This is experimental.
When a view is created, the node ranges are split into [num_processors x 4] builder tasks. Set this property to
2
or higher to build views faster.Default:
1
sstable_preemptive_open_interval_in_mb
-
The size of the SSTable candidates to trigger preemptive opening of compaction output.
The compaction process opens SSTables before the system completely writes them and uses them in place of the prior SSTables for any range previously written. Preemptive opening of SSTables helps to smoothly transfer reads between the SSTables by reducing cache churn and keeps hot rows hot.
A low value has a negative performance impact and will eventually cause heap pressure and GC activity. The optimal value depends on hardware and workload.
Default:
50
Cache and index settings
column_index_size_in_kb
-
Granularity of the index of rows within a partition. For huge rows, decrease this setting to improve seek time.
Default:
64
file_cache_size_in_mb
-
Maximum memory to use for caching SSTable chunks and buffer pools. Allocated from native memory in addition to heap.
Default: smaller of
2048
or ¼ of heap
Streaming settings
These settings apply to operations that perform file streaming, including repairs, bootstraps, and decommissions. These operations are mostly sequential I/O which can saturate a node’s network bandwidth and degrade client (application) performance. To fix this, you must throttle streaming throughput.
inter_dc_stream_throughput_outbound_megabits_per_sec
-
Maximum network bandwidth for streaming file transfers between datacenters. Set to a value less or equal to
stream_throughput_outbound_megabits_per_sec
.Default:
200
Mbps (25 MB/s)
stream_entire_sstables
-
Enables the Zero Copy Streaming feature where eligible SSTables are streamed in their entirety between nodes instead of individual partitions, transferring data at a significantly faster rate.
This feature is bound to the streaming throughput limits and disabled when internode encryption is enabled.
Default:
true
stream_throughput_outbound_megabits_per_sec
-
Maximum network bandwidth permitted for all outbound file transfers streaming on a node.
Default:
200
Mbps (25 MB/s)
streaming_keep_alive_period_in_secs
-
Interval to send keep-alive messages to prevent reset connections during streaming. The streaming session fails when the system does not receive a keep-alive message for two keep-alive cycles equivalent to 10 minutes by default (2 x
300
seconds).Default:
300
Advanced properties
Less commonly-used settings normally reserved for experienced operators.
max_value_size_in_mb
-
The maximum size of any value in SSTables up to a maximum of 2048 MB. If any value exceeds this threshold, HCD marks the SSTables as corrupted.
Default size is the same as the default native protocol frame limit
native_transport_max_frame_size_in_mb
.Default:
256
trickle_fsync
-
Enables flushing portions of SSTables written using sequential writers when
trickle_fsync_interval_in_kb
is reached. This minimizes sudden flushing of dirty buffers, which can impact read latencies.Recommended for use with SSDs which can handle more frequent calls to
fsync()
, but may be detrimental to slow HDDs.Default:
true
trickle_fsync_interval_in_kb
-
Threshold to trigger a flush when
trickle_fsync
is enabled.Default:
10240
(10 MB)
Security properties
Configure authentication, authorization, and role management.
The security properties in cassandra.yaml
control how HCD handles user authentication, authorization, and data encryption.
These settings are crucial for securing your cluster in production environments.
Authentication properties
authenticator
-
The authentication backend that implements
IAuthenticator
to identify users.HCD provides several authentication options:
-
AllowAllAuthenticator
: Performs no authentication checks. Use this to disable authentication. DataStax does not recommend this for production environments. -
PasswordAuthenticator
: Relies on username/password pairs stored in thesystem_auth.roles
table. -
AdvancedAuthenticator
: Allows multiple authentication schemes including internal, OIDC, and LDAP.Default:
AllowAllAuthenticator
If using
PasswordAuthenticator
, you must also useCassandraRoleManager
for role management. Increase thesystem_auth
keyspace replication factor when using authentication.
-
Role management properties
role_manager
-
The role management backend that implements
IRoleManager
to maintain grants and memberships between roles.HCD provides several role management options:
-
CassandraRoleManager
: Stores role data in thesystem_auth
keyspace. -
AdvancedRoleManager
: Fetches roles from internal Cassandra tables and/or external servers.Default:
CassandraRoleManager
Most
IRoleManager
functions require an authenticated login. If the configuredIAuthenticator
doesn’t implement authentication, most functionality will be unavailable.
-
Client encryption properties
client_encryption_options
-
Configure client-to-server encryption settings.
client_encryption_options: enabled: false # Enable client-to-server encryption optional: true # Allow encrypted and unencrypted connections keystore: conf/.keystore # Path to keystore file keystore_password: cassandra # Keystore password require_client_auth: false # Verify client certificates truststore: conf/.truststore # Path to truststore file truststore_password: cassandra # Truststore password protocol: TLS # SSL/TLS protocol store_type: JKS # Keystore type cipher_suites: [ # Supported cipher suites TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384, TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256, TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 ]
The default configuration is insecure. Generate proper keystores and truststores before enabling encryption in production.
Server encryption properties
server_encryption_options
-
Configure server-to-server internode encryption settings.
server_encryption_options: internode_encryption: none # Encryption scope: none, dc, rack, or all optional: true # Allow encrypted and unencrypted connections keystore: conf/.keystore # Path to keystore file keystore_password: cassandra # Keystore password require_client_auth: false # Verify peer server certificates truststore: conf/.truststore # Path to truststore file truststore_password: cassandra # Truststore password require_endpoint_verification: false # Verify hostname in certificate enable_legacy_ssl_storage_port: false # Enable legacy SSL storage port
Encryption scope options:
-
none
: Do not encrypt outgoing connections -
dc
: Encrypt connections to peers in other datacenters, but not within datacenters -
rack
: Encrypt connections to peers in other racks, but not within racks -
all
: Always use encrypted connectionsThe default configuration is insecure. Generate proper keystores and truststores before enabling encryption in production.
-
Transparent data encryption properties
transparent_data_encryption_options
-
Configure transparent data encryption (TDE) for data at rest.
transparent_data_encryption_options: enabled: false # Enable transparent data encryption chunk_length_kb: 64 # Encryption chunk size cipher: AES/CBC/PKCS5Padding # Encryption cipher key_alias: testing:1 # Key alias for encryption iv_length: 16 # CBC IV length for AES key_provider: - class_name: org.apache.cassandra.security.JKSKeyProvider parameters: - keystore: conf/.keystore keystore_password: cassandra store_type: JCEKS key_password: cassandra
Install Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files for your JDK version before enabling TDE. Currently supports encryption for commitlog and hints files.
Audit logging properties
audit_logging_options
-
Configure audit logging to track CQL commands and authentication events.
audit_logging_options: enabled: false # Enable audit logging logger: - class_name: BinAuditLogger # Audit logger implementation audit_logs_dir: # Directory for audit logs included_keyspaces: # Keyspaces to audit excluded_keyspaces: system, system_schema, system_virtual_schema included_categories: # Categories to audit excluded_categories: # Categories to exclude included_users: # Users to audit excluded_users: # Users to exclude roll_cycle: HOURLY # Log roll cycle block: true # Block on log write max_queue_weight: 268435456 # Max queue weight (256 MiB) max_log_size: 17179869184 # Max log size (16 GiB) archive_command: # Archive command max_archive_retries: 10 # Max archive retries
Security cache properties
roles_validity_in_ms
-
Validity period for roles cache in milliseconds.
HCD caches granted roles for authenticated sessions. After this period, they become eligible for async reload.
Default:
2000
Set to
0
to disable caching entirely.
permissions_validity_in_ms
-
Validity period for permissions cache in milliseconds.
Default:
2000
Set to
0
to disable caching.
credentials_validity_in_ms
-
Validity period for credentials cache in milliseconds.
The system tightly couples this cache to the
PasswordAuthenticator
implementation.Default:
2000
Set to
0
to disable caching.
roles_update_interval_in_ms
-
Refresh interval for roles cache in milliseconds.
After this interval, cache entries become eligible for refresh.
Default: Same value as
roles_validity_in_ms
permissions_update_interval_in_ms
-
Refresh interval for permissions cache in milliseconds.
After this interval, cache entries become eligible for refresh.
Default: Same value as
permissions_validity_in_ms
credentials_update_interval_in_ms
-
Refresh interval for credentials cache in milliseconds.
After this interval, cache entries become eligible for refresh.
Default: Same value as
credentials_validity_in_ms
User-defined functions (UDF)
Configure user-defined functions (UDFs) that allow custom logic to be executed within the database.
enable_user_defined_functions
-
Enables user-defined functions (UDFs) on this node.
As of Cassandra 3.0, the system has a sandbox in place that should prevent execution of malicious code.
Default:
false
enable_scripted_user_defined_functions
-
Enables scripted UDFs (JavaScript UDFs).
Java UDFs are always enabled if
enable_user_defined_functions
is true. Enable this option to use UDFs with "language javascript" or any custom JSR-223 provider. This option has no effect ifenable_user_defined_functions
is false.Default:
false
Memory leak detection settings
Configure garbage collection monitoring and memory leak detection thresholds.
The following properties are commented out in the default |
gc_log_threshold_in_ms
-
HCD logs GC pauses greater than this threshold at INFO level.
This threshold can be adjusted to minimize logging if necessary.
Default:
200
ms (commented out in default configuration)
gc_warn_threshold_in_ms
-
The system logs GC pauses greater than this threshold at WARN level.
Adjust the threshold based on your application throughput requirements. Setting to
0
deactivates the feature.Default:
1000
ms (commented out in default configuration)
max_value_size_in_mb
-
Maximum size of any value in SSTables. Safety measure to detect SSTable corruption early.
Any value size larger than this threshold will result in HCD marking an SSTable as corrupted. This should be positive and less than 2048.
Default:
256
MB (commented out in default configuration)
Guardrails
Guardrails are system limits that ensure high availability and optimal performance of the database. They help prevent operations that could cause performance issues or system instability. For more information, see HCD guardrails.
emulate_dbaas_defaults
-
When enabled, modifies defaults to match those used by DataStax Constellation (DataStax cloud data platform), including stricter guardrails defaults.
This can be used as a convenience to develop and test applications meant to run on DataStax Constellation.
When enabled, the updated defaults reflect those of DataStax Constellation at the time of the currently used HCD release. This is a best-effort emulation of said defaults. All nodes must use the same config value.
Default:
false
guardrails
-
Configure HCD system limits which ensure high availability and optimal performance of the database.
guardrails: # Tombstone thresholds tombstone_warn_threshold: 1000 tombstone_failure_threshold: 100000 # Partition size threshold partition_size_warn_threshold_in_mb: 100 # Batch size thresholds batch_size_warn_threshold_in_kb: 64 batch_size_fail_threshold_in_kb: 640 unlogged_batch_across_partitions_warn_threshold: 10 # Column and collection thresholds column_value_size_failure_threshold_in_kb: -1 columns_per_table_failure_threshold: -1 fields_per_udt_failure_threshold: -1 collection_size_warn_threshold_in_kb: -1 items_per_collection_warn_threshold: -1 # Index thresholds secondary_index_per_table_failure_threshold: -1 sai_indexes_per_table_failure_threshold: 10 sai_indexes_total_failure_threshold: 100 # View and table thresholds materialized_view_per_table_failure_threshold: -1 tables_warn_threshold: -1 tables_failure_threshold: -1 # Query thresholds page_size_failure_threshold_in_kb: -1 in_select_cartesian_product_failure_threshold: -1 partition_keys_in_select_failure_threshold: -1 # Disk usage thresholds disk_usage_percentage_warn_threshold: -1 disk_usage_percentage_failure_threshold: -1 disk_usage_max_disk_size_in_gb: -1 # Operation controls read_before_write_list_operations_enabled: true user_timestamps_enabled: true # Table properties and consistency levels table_properties_disallowed: write_consistency_levels_disallowed:
Tombstone thresholds
tombstone_warn_threshold
-
Log a warning when scanning more tombstones than this threshold.
When executing a scan, within or across a partition, Cassandra keeps tombstones in memory to return them to the coordinator. With workloads that generate many tombstones, this can cause performance problems and even exhaust the server heap.
Default:
1000
(may differ ifemulate_dbaas_defaults
is enabled)
tombstone_failure_threshold
-
Fail queries that scan more tombstones than this threshold.
Default:
100000
(may differ ifemulate_dbaas_defaults
is enabled)
Partition size threshold
partition_size_warn_threshold_in_mb
-
Log a warning when compacting partitions larger than this value.
Default:
100
MB (may differ ifemulate_dbaas_defaults
is enabled)
Batch size thresholds
batch_size_warn_threshold_in_kb
-
Log WARN on any multiple-partition batch size that exceeds this value.
Use caution when increasing this threshold as it can lead to node instability.
Default:
64
KB (may differ ifemulate_dbaas_defaults
is enabled)
batch_size_fail_threshold_in_kb
-
Fail any multiple-partition batch that exceeds this value.
The calculated default is 640 KB (10x warn threshold).
Default:
640
KB (may differ ifemulate_dbaas_defaults
is enabled)
unlogged_batch_across_partitions_warn_threshold
-
Log WARN on any batches not of type LOGGED that span across more partitions than this limit.
Default:
10
(may differ ifemulate_dbaas_defaults
is enabled)
Column and collection thresholds
column_value_size_failure_threshold_in_kb
-
Failure threshold to prevent writing large column values into HCD.
Default:
-1
(disabled, may differ ifemulate_dbaas_defaults
is enabled)
columns_per_table_failure_threshold
-
Failure threshold to prevent creating more columns per table than threshold.
Default:
-1
(disabled, may differ ifemulate_dbaas_defaults
is enabled)
fields_per_udt_failure_threshold
-
Failure threshold to prevent creating more fields in user-defined-type than threshold.
Default:
-1
(disabled, may differ ifemulate_dbaas_defaults
is enabled)
collection_size_warn_threshold_in_kb
-
Warning threshold to warn when encountering larger size of collection data than threshold.
Default:
-1
(disabled, may differ ifemulate_dbaas_defaults
is enabled)
items_per_collection_warn_threshold
-
Warning threshold to warn when encountering more elements in collection than threshold.
Default:
-1
(disabled, may differ ifemulate_dbaas_defaults
is enabled)
Index thresholds
secondary_index_per_table_failure_threshold
-
Failure threshold to prevent creating more secondary indexes per table than threshold. Does not apply to CUSTOM INDEX StorageAttachedIndex.
Default:
-1
(disabled, may differ ifemulate_dbaas_defaults
is enabled)
sai_indexes_per_table_failure_threshold
-
Failure threshold for number of StorageAttachedIndex per table. Only applies to CUSTOM INDEX StorageAttachedIndex.
Default:
10
(same whenemulate_dbaas_defaults
is enabled)
sai_indexes_total_failure_threshold
-
Failure threshold for total number of StorageAttachedIndex across all keyspaces. Only applies to CUSTOM INDEX StorageAttachedIndex.
Default:
100
(same whenemulate_dbaas_defaults
is enabled)
View and table thresholds
materialized_view_per_table_failure_threshold
-
Failure threshold to prevent creating more materialized views per table than threshold.
Default:
-1
(disabled, may differ ifemulate_dbaas_defaults
is enabled)
tables_warn_threshold
-
Warning threshold to warn when creating more tables than threshold.
Default:
-1
(disabled, may differ ifemulate_dbaas_defaults
is enabled)
tables_failure_threshold
-
Failure threshold to prevent creating more tables than threshold.
Default:
-1
(disabled, may differ ifemulate_dbaas_defaults
is enabled)
Query thresholds
page_size_failure_threshold_in_kb
-
Failure threshold to prevent providing larger paging by bytes than threshold, also served as a hard paging limit when paging by rows is used.
Default:
-1
(disabled, may differ ifemulate_dbaas_defaults
is enabled)
in_select_cartesian_product_failure_threshold
-
Failure threshold to prevent IN query creating size of cartesian product exceeding threshold.
Example:
"a in (1,2,…10) and b in (1,2…10)"
results in cartesian product of 100.Default:
-1
(disabled, may differ ifemulate_dbaas_defaults
is enabled)
partition_keys_in_select_failure_threshold
-
Failure threshold to prevent IN query containing more partition keys than threshold.
Default:
-1
(disabled, may differ ifemulate_dbaas_defaults
is enabled)
Disk usage thresholds
disk_usage_percentage_warn_threshold
-
Warning threshold to warn when local disk usage exceeds threshold. Valid values: 1, 100.
Default:
-1
(disabled, may differ ifemulate_dbaas_defaults
is enabled)
disk_usage_percentage_failure_threshold
-
Failure threshold to reject write requests if replica disk usage exceeds threshold. Valid values: 1, 100.
Default:
-1
(disabled, may differ ifemulate_dbaas_defaults
is enabled)
disk_usage_max_disk_size_in_gb
-
Allows configuring max disk size of data directories when calculating thresholds for
disk_usage_percentage_warn_threshold
anddisk_usage_percentage_failure_threshold
.Valid values: (1, max available disk size of all data directories).
Default:
-1
(disabled and use the physically available disk size of data directories during calculations, may differ ifemulate_dbaas_defaults
is enabled)
Operation controls
read_before_write_list_operations_enabled
-
Whether to allow read-before-write operations, such as setting list element by index or removing list element by index.
Note: Lightweight Transactions (LWT) is always allowed.
Default:
true
(may differ ifemulate_dbaas_defaults
is enabled)
user_timestamps_enabled
-
Whether to allow user-provided timestamps in write requests.
Default:
true
(may differ ifemulate_dbaas_defaults
is enabled)
Table properties and consistency levels
table_properties_disallowed
-
Prevents creating tables with provided configurations.
Default: All properties are allowed (may differ if
emulate_dbaas_defaults
is enabled)
write_consistency_levels_disallowed
-
Prevents queries with provided consistency levels.
Default: All consistency levels are allowed.