cassandra.yaml configuration file

The cassandra.yaml file is the main configuration file for Hyper-Converged Database (HCD).

After changing properties in the cassandra.yaml file, you must restart the node for the changes to take effect.

Syntax

For the properties in each section, the parent setting has zero spaces. Each child entry requires at least two spaces. Adhere to the YAML syntax and retain the spacing.

The system shows undefined default values as Default: none.

The system describes internally defined default values.

HCD can define default values internally, comment them out, or create implementation dependencies on other properties in the cassandra.yaml file. Additionally, some commented-out values may not match the actual default values. DataStax recommends the commented out values as alternatives to the default values.

Organization

The cassandra.yaml file groups configuration properties into the following sections:

Quick start: The minimal properties needed for configuring a cluster.
Default directories: Update these properties if you changed any of the default directories during installation.
Data directory configuration

Properties for configuring the location of a single or multiple JBOD data directories.
Commonly used properties: Properties most frequently used when configuring HCD.
Performance tuning: Tuning performance and system resource utilization, including commit log, compaction, memory, disk I/O, CPU, reads, and writes.
Advanced properties: Properties for advanced users or properties that are less commonly used.
Security properties: Configure authentication, authorization, and role management.
User-defined functions (UDF) properties: Configure how UDF code is executed inside Cassandra daemons.
Continuous paging options: Configure memory, threads, and duration when pushing pages continuously to the client.
Memory leak detection settings: Configure memory leak detection.
DataStax Astra emulation: Enable emulation mode for testing applications meant to run on Astra DB.
Guardrails: Configure HCD system limits to ensure high availability and optimal database performance.

Quick start properties

These are the minimal properties needed for configuring a cluster.

cluster_name: The name of the cluster. This setting prevents nodes in one logical cluster from joining another, so you must set it to a unique name other than the default. All nodes in a cluster must have the same value.

Default: 'Test Cluster'

rpc_address: The address that client applications connect to. You typically set this to a node’s public IP address that is routable from the clients.

If not changed from the default localhost, only applications deployed on the server will be able to connect to the node.

Default: localhost

listen_address: The IP address or hostname that the database binds to, exclusively for private communication between nodes in the cluster. You typically set this to a node’s private IP address that is routable from other nodes.

If not changed from the default localhost, the node will not be able to communicate with other nodes in the cluster.

Default: localhost

listen_interface

The interface that the database binds to for connecting to other nodes. Interfaces must correspond to a single address. HCD does not support IP aliasing.

Never set listen_address to 0.0.0.0.

Set listen_address or listen_interface, not both.

listen_interface_prefer_ipv6

Use IPv4 or IPv6 when an interface is specified by name.

false: Use the first IPv4 address.
true: Use the first IPv6 address.

When you use only a single address, HCD selects that address without regard to this setting.

Default: false

Default directories

If you have changed any of the default directories during installation, set these properties to the new locations. Make sure you have root access.

data_file_directories

The directory where table data is stored on disk. The database distributes data evenly across the location, subject to the granularity of the configured compaction strategy.

For production, DataStax recommends RAID 0 and SSDs.

Default: /var/lib/cassandra/data

commitlog_directory

The directory where HCD stores the commit log.

For optimal write performance, place the commit log on a separate disk partition, or ideally on a separate physical device, from the data directories. Because the commit log only appends data, a hard disk drive (HDD) works as long as it keeps up with the writes.

Default: $CASSANDRA_HOME/data/commitlog

The commitlog_directory and the cdc_raw_directory must reside on the same partition. Keep these directories in separate sub-folders that are not nested.

cdc_raw_directory

The directory where HCD stores change data capture (CDC) commit log segments on flush. DataStax recommends using a physical device that is separate from the data directories. See Change Data Capture (CDC) logging.

Default: $CASSANDRA_HOME/data/cdc_raw

The cdc_raw_directory and the commitlog_directory must reside on the same partition. Keep these directories in separate sub-folders that are not nested.

hints_directory: The directory where HCD stores hints (missed writes).

Default: $CASSANDRA_HOME/data/hints

metadata_directory: The directory that holds cluster metadata including information about the local node and its peers.

Default: $CASSANDRA_HOME/data/metadata

saved_caches_directory: The directory location where HCD stores table key and row caches.

Default: $CASSANDRA_HOME/data/saved_caches

Data directory configuration

Distributing data across multiple disks, also known as "JBOD configuration" (just a bunch of disks), maximizes throughput and ensures efficient disk I/O. HCD lets you specify multiple directories for storing your data in this distributed manner. DataStax recommends using striped LVM instead.

To configure a single data directory in the cassandra.yaml file:

data_file_directories:
     - /var/lib/cassandra/data

For multiple data directories:

data_file_directories:
     - /disk1/datadir
     - /disk2/datadir
     - /disk3/datadir

Commonly used properties

These are the properties most frequently used when configuring HCD.

Before starting a node for the first time, DataStax recommends that you carefully evaluate your requirements.

Common initialization properties
Common compaction settings
Memtable settings
Common automatic backup settings

Common initialization properties

Be sure to set the properties in the Quick start section as well.

commit_failure_policy

Determines how HCD handles commit log disk failures.

die: Shut down the node and kill the JVM, so the node can be replaced.
stop: Shut down the node, leaving the node effectively dead, available for inspection using JMX.
stop_commit: Shut down the commit log, let writes collect, but continue to service reads.
ignore: Ignore fatal errors and let the batches fail.

Default: stop (recommended)

disk_optimization_strategy

Reading from spinning disks is slow so the system buffers them with an extra page of 4KB just in case. This is unnecessary for SSDs so the system buffers only what is required.

ssd: Data directory backed by solid state disks
spinning: Data directory backed by spinning disks

Default: ssd

disk_failure_policy

Determines how HCD handles disk failures.

die: Shut down gossip and client transports, and kill the JVM for any file system errors or single SSTable errors. Enables you to replace the node.
stop_paranoid: Shut down the node, even for single SSTable errors.
stop: Shut down the node leaving the node effectively dead, but the JVM is still available for inspection using JMX.
best_effort: Stop using the failed disk and respond to requests based on the remaining available SSTables. This setting allows obsolete data at consistency level of ONE.
ignore: Ignore fatal errors and let the requests fail; all file system errors are logged but otherwise ignored.

Recommended policies are stop and best_effort.

+ Default: stop

endpoint_snitch

Configure this property to set the snitch. The most common snitches are:

Default: SimpleSnitch + Uses replication strategy order for proximity. This snitch does not recognize racks or datacenters, and considers all nodes as belonging to one ring (single DC) making it incompatible with multi-DC deployments and unsuitable for production environments. + This snitch is appropriate for development environments only.
GossipingPropertyFileSnitch (GPFS) + Uses rack and datacenter information for the local node defined in the cassandra-rackdc.properties file and propagates this information to other nodes via gossip. + This snitch is recommended for production environments and is almost always the correct choice.
PropertyFileSnitch (PFS) + Determines node proximity using the rack and datacenter location defined in the cassandra-topology.properties file. GPFS supersedes this snitch. Only use PFS for backwards compatibility.

For other snitches such as Ec2Snitch and GoogleCloudSnitch, see About snitches.

All nodes in a cluster must use the same snitch.

HCD determines replica placement (which defines where copies of data are stored) using the information provided by the snitch. Changing the snitch has implications for where the data is located so requires additional steps and should only be performed by experienced operators.

seed_provider

The gossip seed provider and corresponding addresses of nodes that are designated as contact points in the cluster. A joining node contacts the nodes in the seeds list and establishes a connection to the first available node to discover the members of the cluster and topology.

class_name: The class that handles the seed logic. HCD uses the default in almost all clusters. However, you can substitute a custom seed provider in limited edge cases. + Default: org.apache.cassandra.locator.SimpleSeedProvider

seeds: A comma delimited list of addresses and their corresponding storage_port. A new node joining the cluster uses the list to bootstrap the gossip process. If the cluster has multiple nodes, the default value must be changed to the IP address and gossip port of one of the nodes.

Default: "127.0.0.1:7000"

Making every node a seed node is not recommended because of increased maintenance and reduced gossip performance. Gossip optimization is not critical, but it is recommended to use a small seed list of approximately three nodes per datacenter.

Advanced initialization properties

allocate_tokens_for_keyspace: Triggers the algorithm which allocates optimum num_tokens tokens such that token ranges are spread evenly across nodes meaning data is distributed more evenly compared to the legacy random allocation. Only supported on clusters using Murmur3Partitioner.

The replication strategy of the specified keyspace is used by the algorithm for optimizing token allocation when new nodes join a cluster.

The property allocate_tokens_for_local_replication_factor is preferred over allocate_tokens_for_keyspace, particularly when adding nodes in a new datacenter where a keyspace is not yet replicated. If neither property is set, defaults to legacy behaviour where tokens are allocated randomly.

allocate_tokens_for_local_replication_factor: Triggers the algorithm which allocates optimum num_tokens tokens such that token ranges are spread evenly across nodes meaning data is distributed more evenly compared to the legacy random allocation. Only supported on clusters using Murmur3Partitioner.

Specify the replication factor in the local datacenter, 3 for example, that the algorithm uses to optimize token allocation when new nodes join a cluster.

allocate_tokens_for_local_replication_factor is preferred over allocate_tokens_for_keyspace because it does not require the replication of a keyspace to be defined. This is especially helpful when adding nodes in a new datacenter. If neither property is set, defaults to legacy behaviour where tokens are allocated randomly.

auto_bootstrap: When joining a cluster for the first time, this property determines whether the node will request replicas to stream data. This is the default behavior. If the node is defined as a seed, it immediately joins the cluster without data.

Non-seed nodes will bootstrap automatically by default. Set to false when adding nodes in a new datacenter where bootstrap is manually triggered by an operator with the nodetool rebuild command.

Default: true

broadcast_address: Set to the node’s public IP address in environments where nodes are only able to communicate across networks using their public IP adresses such as multi-region Amazon EC2 deployments. Otherwise, the node will broadcast on the same address as listen_address.

Set a separate listen_address and broadcast_address on a node with multiple network interfaces or where nodes are not able to communicate over private IP addresses. Not required in environments that support automatic switching between private and public communication.

Default: uses value of listen_address

initial_token: The property for manually assigning tokens for ranges to be owned by the node.

Specify one token value for legacy single-token clusters. For clusters with virtual nodes enabled, specify multiple tokens as a comma-separated list.

When setting initial_token, the corresponding num_tokens must also be set.

Default: not set in preference for num_tokens

listen_on_broadcast_address: Set to true on nodes with multiple interfaces to enable communication on both listen_address and broadcast_address.

Default: false

num_tokens: Defines the number of tokens to assign to the node.

Early versions of Cassandra used a default value of 256 tokens for clusters with virtual nodes enabled. This setting shares data with more peers and offers the least variance in data size among nodes in the same datacenter, but might lead to decreased availability in the event of node outages.

Lesser token counts such as 4 or 8 have a higher availability but also higher variance in data size. 16 tokens achieves a good distribution of data without compromising too much on availability.

Default: 16 tokens

partitioner: The partitioner determines how data is distributed across the nodes in the cluster.

The default Murmur3Partitioner is the correct and only choice for new clusters. The legacy partitioners provide backward-compatibility with existing clusters upgraded from older versions of Cassandra, because the partitioner can never be changed on a running cluster.

Default: org.apache.cassandra.dht.Murmur3Partitioner

Common compaction settings

compaction_throughput_mb_per_sec

The rate in megabytes/second at which HCD compacts SSTable candidates. The faster the database inserts data, the faster HCD must compact in order to keep the number of SSTables down.

Set to 16 to 32 times the write throughput in MB/second. Otherwise, set to 0 to disable compaction throttling. A high setting means that HCD uses more disk I/O for compaction, leaving less I/O bandwidth for reads.

Default: 64

See Configure compaction.

Memtable settings

When a node receives a write request, HCD stores the data in a memory structure called a memtable and appends it to the commit log on disk for durability (see How data is written). HCD can allocate memtable segments either on- or off-heap.

memtable_allocation_type

Determines how HCD allocates memory to the memtable.

heap_buffers: The system allocates memtables on JVM heap. Suitable for general workloads where heap memory is sufficient.
offheap_buffers: Uses Java NIO direct buffers to store cell names and values off-heap. This allocation type reduces heap utilization significantly, leading to reduced GC pressure.
offheap_objects: The system allocates memtables completely off-heap, directly in native memory. This allocation type is recommended particularly for clusters that handle large datasets. Writes are around 5% faster mostly due to memtables flushing less often.
unslabbed_heap_buffers: The system allocates memtables on a JVM heap without using a slab allocator. This can lead to increased heap fragmentation. DataStax does not recommend this option for any environments.

Default: offheap_objects

memtable_heap_space_in_mb: The maximum amount of memory to allocate for memtables on JVM heap. When the threshold is reached, the system blocks writes until a flush completes. The system triggers a flush of the largest memtable based on memtable_cleanup_threshold.

Default: ¼ of heap

memtable_offheap_space_in_mb: The maximum amount of memory to allocate for memtables from native memory. When the threshold is reached, HCD blocks writes until a flush completes. HCD triggers a flush of the largest memtable based on memtable_cleanup_threshold.

Default: ¼ of heap

memtable_cleanup_threshold: The threshold that triggers a flush based on the ratio of memtable size to the maximum memory size permitted for memtables.

The system deprecates setting a value since the default calculation is the only reasonable choice.

Default: 1 / (memtable_flush_writers + 1)

memtable_flush_writers: The total number of memtables that can be flushed concurrently as well as the number of flush writer threads per disk.

A single thread is generally capable of keeping up with ingesting writes on a node with a single fast disk unless it becomes IO-bound temporarily so two flush writers are usually sufficient. If flushing is falling behind (MemtablePool.BlockedOnAllocation metric is greater than 0), increment the number of flush writers.

Note that more writers can lead to more frequent flushes and smaller SSTables which puts pressure on compactions.

Default: 2 for nodes with a single data directory, otherwise 1 per memtable

Common automatic backup settings

HCD does not automatically clear backups and snapshots so that disk usage can grow unbounded. When the disk gets full, HCD automatically shuts down by default when it can no longer write files to disk.

DataStax recommends setting up a process to clear incremental backups each time a new snapshot is created.

auto_snapshot: When enabled (set to true), a snapshot is taken before DROP KEYPACE, DROP TABLE, or TRUNCATE TABLE is executed.

DataStax strongly recommends enabling auto snapshot as a precaution, in case someone executes the DROP or TRUNCATE commands accidentally against the wrong keyspace or table.

Default: true

incremental_backups: When set to true, HCD creates hard links to each SSTable that has been flushed or streamed in the backups/ subdirectory of the keyspace data.

Default: false

snapshot_before_compaction: When set to true, HCD takes a snapshot before each compaction task. You may use the snapshot as a rollback position in an upgrade. Usage is limited since the general recommendation is to take backups before performing an upgrade.

Use with extreme caution as disk usage can grow exponentially.

Default: false

Performance tuning

Tuning performance and system resource utilization, including commit log, compaction, memory, disk I/O, CPU, reads, and writes.

Performing tuning properties include:

Commit log settings
Change-data-capture (CDC) space settings
Common compaction settings
Memtable settings
Cache and index settings
Streaming settings

Commit log settings

commitlog_sync

Defines the mode by which the commit log is synchronized to disk. When the data is considered fully persisted to storage, the data will survive a system crash or power outage. The sync mode also determines when HCD sends a successful write acknowledgement to the coordinator.

batch: Each write request triggers a call to sync immediately. The acknowledgement is blocked until the after the commit log has been flushed to disk. Prioritizes durability over performance.
group: Similar to batch mode but waits up to commitlog_sync_group_window_in_ms between flushes so more writes are persisted together. The system also blocks the acknowledgement until after the commit log has been flushed to disk. Recommended over batch mode.
periodic: The system synchronizes the commit log every commitlog_sync_period_in_ms, but the write is acknowledged immediately. Prioritizes performance over durability.

Default: periodic

commitlog_sync_period_in_ms

Time interval between commit log syncs to disk. Only set with periodic sync mode, otherwise an exception will be logged.

Default: 10000 (10 seconds)

commitlog_sync_group_window_in_ms

The minimum interval between disk syncs. Only set with group sync mode, otherwise an exception will be logged.

Default: 1000 (1 second)

commitlog_sync_batch_window_in_ms

Deprecated. The maximum delay between disk syncs. No longer used.

commitlog_segment_size_in_mb

The size of individual commit log file segments. A small size means more frequent flushes leading to small SSTables which put pressure on compaction.

If you use the commit log archives for point-in-time recovery, it is reasonable to reduce the size to 16 or 8MB for finer granularity. However, be aware that the maximum mutation size is dependent on the segment size.

Default: 32

max_mutation_size_in_kb: The maximum allowed size of a mutation (the payload size of a write request) which defaults to half of the commit log segment (commitlog_segment_size_in_mb). If explicitly set, you must set commitlog_segment_size_in_mb to at least twice the value of max_mutation_size_in_kb.

Before increasing the commitlog segment size of the commitlog segments, investigate why the mutations are larger than expected. Look for underlying issues with access patterns and data model, because increasing the commitlog segment size is a limited fix.

Default: ½ of commitlog_segment_size_in_mb

commitlog_total_space_in_mb

The maximum disk space for commit logs on disk.

If the limit is reached, HCD flushes the oldest commit log segments to reclaim disk space. A small size means more frequent flushes on less-active tables leading to small SSTables which put pressure on compaction.

Default: smaller of 8192 or ¼ of commitlog/ disk

See Configure memtable thresholds.

commitlog_compression

By default, the commit log is not compressed. To enable compression, specify the compression library to use.

The supported libraries are:

DeflateCompressor: Legacy option that is the slowest compared to newer algorithms. This option is not recommended.
LZ4Compressor: Fastest algorithm, but offers less compression ratios. Choose when speed is preferred over space savings.
SnappyCompressor: Not as fast as LZ4, but provides better compression.
ZstdCompressor: Provides the best compression ratio, but is slower than other algorithms.
```
commitlog_compression:
  - class_name: LZ4Compressor
```

Change Data Capture (CDC) settings

Compaction settings

See also compaction_throughput_mb_per_sec in the Common compaction settings section and Configure compaction.

concurrent_compactors

The number of compaction threads allowed to run simultaneously. Simultaneous compactions help preserve read performance in a mixed read-write workload by limiting the number of small SSTables that accumulate during a single long-running compaction.

Generally, the calculated default value is appropriate and does not need adjusting. DataStax recommends contacting DataStax Support before changing this value. If your data directories are backed by SSDs, increase this value to the number of cores.

If compaction runs too slowly or too fast, adjust the compaction_throughput_mb_per_sec option in the Common compaction settings section.

Increasing concurrent compactors leads to more use of available disk space for compaction, because concurrent compactions happen in parallel, especially for STCS. Ensure that adequate disk space is available before increasing this configuration.

Default: fewer of number of data disks or CPU cores, with a minimum of 2 and a maximum of 8

concurrent_validations

The number of repair validation threads allowed to run simultaneously.

Defaults to the value of concurrent_compactors if not configured or set to ⇐ 0. Requires system property -Dcassandra.allow_unlimited_concurrent_validations=true to set validation threads to a value higher than concurrent compactors.

Default: the value of concurrent_compactors

concurrent_materialized_view_builders

The number of view builder tasks allowed to run simultaneously if materialized views are enabled. This is experimental.

When a view is created, the node ranges are split into [num_processors x 4] builder tasks. Set this property to 2 or higher to build views faster.

Default: 1

sstable_preemptive_open_interval_in_mb

The size of the SSTable candidates to trigger preemptive opening of compaction output.

The compaction process opens SSTables before the system completely writes them and uses them in place of the prior SSTables for any range previously written. Preemptive opening of SSTables helps to smoothly transfer reads between the SSTables by reducing cache churn and keeps hot rows hot.

A low value has a negative performance impact and will eventually cause heap pressure and GC activity. The optimal value depends on hardware and workload.

Default: 50

Cache and index settings

column_index_size_in_kb: Granularity of the index of rows within a partition. For huge rows, decrease this setting to improve seek time.

Default: 64

file_cache_size_in_mb: Maximum memory to use for caching SSTable chunks and buffer pools. Allocated from native memory in addition to heap.

Default: smaller of 2048 or ¼ of heap

Streaming settings

These settings apply to operations that perform file streaming, including repairs, bootstraps, and decommissions. These operations are mostly sequential I/O which can saturate a node’s network bandwidth and degrade client (application) performance. To fix this, you must throttle streaming throughput.

inter_dc_stream_throughput_outbound_megabits_per_sec: Maximum network bandwidth for streaming file transfers between datacenters. Set to a value less or equal to stream_throughput_outbound_megabits_per_sec.

Default: 200 Mbps (25 MB/s)

stream_entire_sstables: Enables the Zero Copy Streaming feature where eligible SSTables are streamed in their entirety between nodes instead of individual partitions, transferring data at a significantly faster rate.

This feature is bound to the streaming throughput limits and disabled when internode encryption is enabled.

Default: true

stream_throughput_outbound_megabits_per_sec: Maximum network bandwidth permitted for all outbound file transfers streaming on a node.

Default: 200 Mbps (25 MB/s)

streaming_keep_alive_period_in_secs: Interval to send keep-alive messages to prevent reset connections during streaming. The streaming session fails when the system does not receive a keep-alive message for two keep-alive cycles equivalent to 10 minutes by default (2 x 300 seconds).

Default: 300

Advanced properties

Less commonly-used settings normally reserved for experienced operators.

max_value_size_in_mb: The maximum size of any value in SSTables up to a maximum of 2048 MB. If any value exceeds this threshold, HCD marks the SSTables as corrupted.

Default size is the same as the default native protocol frame limit native_transport_max_frame_size_in_mb.

Default: 256
trickle_fsync: Enables flushing portions of SSTables written using sequential writers when trickle_fsync_interval_in_kb is reached. This minimizes sudden flushing of dirty buffers, which can impact read latencies.

Recommended for use with SSDs which can handle more frequent calls to fsync(), but may be detrimental to slow HDDs.

Default: true
trickle_fsync_interval_in_kb: Threshold to trigger a flush when trickle_fsync is enabled.

Default: 10240 (10 MB)

Security properties

Configure authentication, authorization, and role management.

The security properties in cassandra.yaml control how HCD handles user authentication, authorization, and data encryption. These settings are crucial for securing your cluster in production environments.

Authentication properties
Authorization properties
Role management properties
Network authorization properties
Client encryption properties
Server encryption properties
Transparent data encryption properties
Audit logging properties
Security cache properties

Authentication properties

authenticator

The authentication backend that implements IAuthenticator to identify users.

HCD provides several authentication options:

AllowAllAuthenticator: Performs no authentication checks. Use this to disable authentication. DataStax does not recommend this for production environments.
PasswordAuthenticator: Relies on username/password pairs stored in the system_auth.roles table.
AdvancedAuthenticator: Allows multiple authentication schemes including internal, OIDC, and LDAP.

Default: AllowAllAuthenticator

If using PasswordAuthenticator, you must also use CassandraRoleManager for role management. Increase the system_auth keyspace replication factor when using authentication.

Authorization properties

authorizer

The authorization backend that implements IAuthorizer to limit access and provide permissions.

HCD provides several authorization options:

AllowAllAuthorizer: Allows any action to any user. Use this to disable authorization. DataStax does not recommend this for production environments.
CassandraAuthorizer: Stores permissions in the system_auth.role_permissions table.
AdvancedAuthorizer: Checks if roles have authorization permissions to access resources.

Default: AllowAllAuthorizer

If using CassandraAuthorizer, increase the system_auth keyspace replication factor.

Role management properties

role_manager

The role management backend that implements IRoleManager to maintain grants and memberships between roles.

HCD provides several role management options:

CassandraRoleManager: Stores role data in the system_auth keyspace.
AdvancedRoleManager: Fetches roles from internal Cassandra tables and/or external servers.

Default: CassandraRoleManager

Most IRoleManager functions require an authenticated login. If the configured IAuthenticator doesn’t implement authentication, most functionality will be unavailable.

Network authorization properties

network_authorizer

The network authorization backend that implements INetworkAuthorizer to restrict user access to certain datacenters.

HCD provides several network authorization options:

AllowAllNetworkAuthorizer: Allows access to any datacenter to any user.
CassandraNetworkAuthorizer: Stores permissions in the system_auth.network_permissions table.

Default: AllowAllNetworkAuthorizer

Client encryption properties

client_encryption_options

Configure client-to-server encryption settings.

client_encryption_options:
    enabled: false                    # Enable client-to-server encryption
    optional: true                    # Allow encrypted and unencrypted connections
    keystore: conf/.keystore         # Path to keystore file
    keystore_password: cassandra     # Keystore password
    require_client_auth: false        # Verify client certificates
    truststore: conf/.truststore     # Path to truststore file
    truststore_password: cassandra   # Truststore password
    protocol: TLS                     # SSL/TLS protocol
    store_type: JKS                   # Keystore type
    cipher_suites: [                 # Supported cipher suites
        TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
        TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
        TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
    ]

The default configuration is insecure. Generate proper keystores and truststores before enabling encryption in production.

Server encryption properties

server_encryption_options

Configure server-to-server internode encryption settings.

server_encryption_options:
    internode_encryption: none        # Encryption scope: none, dc, rack, or all
    optional: true                    # Allow encrypted and unencrypted connections
    keystore: conf/.keystore         # Path to keystore file
    keystore_password: cassandra     # Keystore password
    require_client_auth: false        # Verify peer server certificates
    truststore: conf/.truststore     # Path to truststore file
    truststore_password: cassandra   # Truststore password
    require_endpoint_verification: false  # Verify hostname in certificate
    enable_legacy_ssl_storage_port: false # Enable legacy SSL storage port

Encryption scope options:

none: Do not encrypt outgoing connections
dc: Encrypt connections to peers in other datacenters, but not within datacenters
rack: Encrypt connections to peers in other racks, but not within racks
all: Always use encrypted connections

The default configuration is insecure. Generate proper keystores and truststores before enabling encryption in production.

Transparent data encryption properties

transparent_data_encryption_options

Configure transparent data encryption (TDE) for data at rest.

transparent_data_encryption_options:
    enabled: false                    # Enable transparent data encryption
    chunk_length_kb: 64              # Encryption chunk size
    cipher: AES/CBC/PKCS5Padding     # Encryption cipher
    key_alias: testing:1             # Key alias for encryption
    iv_length: 16                    # CBC IV length for AES
    key_provider:
      - class_name: org.apache.cassandra.security.JKSKeyProvider
        parameters:
          - keystore: conf/.keystore
            keystore_password: cassandra
            store_type: JCEKS
            key_password: cassandra

Install Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files for your JDK version before enabling TDE. Currently supports encryption for commitlog and hints files.

Audit logging properties

audit_logging_options

Configure audit logging to track CQL commands and authentication events.

audit_logging_options:
    enabled: false                    # Enable audit logging
    logger:
      - class_name: BinAuditLogger   # Audit logger implementation
    audit_logs_dir:                  # Directory for audit logs
    included_keyspaces:              # Keyspaces to audit
    excluded_keyspaces: system, system_schema, system_virtual_schema
    included_categories:             # Categories to audit
    excluded_categories:             # Categories to exclude
    included_users:                  # Users to audit
    excluded_users:                  # Users to exclude
    roll_cycle: HOURLY              # Log roll cycle
    block: true                      # Block on log write
    max_queue_weight: 268435456     # Max queue weight (256 MiB)
    max_log_size: 17179869184       # Max log size (16 GiB)
    archive_command:                 # Archive command
    max_archive_retries: 10         # Max archive retries

Security cache properties

roles_validity_in_ms: Validity period for roles cache in milliseconds.

HCD caches granted roles for authenticated sessions. After this period, they become eligible for async reload.

Default: 2000

Set to 0 to disable caching entirely.

permissions_validity_in_ms: Validity period for permissions cache in milliseconds.

Default: 2000

Set to 0 to disable caching.

credentials_validity_in_ms: Validity period for credentials cache in milliseconds.

The system tightly couples this cache to the PasswordAuthenticator implementation.

Default: 2000

Set to 0 to disable caching.

roles_update_interval_in_ms: Refresh interval for roles cache in milliseconds.

After this interval, cache entries become eligible for refresh.

Default: Same value as roles_validity_in_ms

permissions_update_interval_in_ms: Refresh interval for permissions cache in milliseconds.

After this interval, cache entries become eligible for refresh.

Default: Same value as permissions_validity_in_ms

credentials_update_interval_in_ms: Refresh interval for credentials cache in milliseconds.

After this interval, cache entries become eligible for refresh.

Default: Same value as credentials_validity_in_ms

User-defined functions (UDF)

Configure user-defined functions (UDFs) that allow custom logic to be executed within the database.

enable_user_defined_functions: Enables user-defined functions (UDFs) on this node.

As of Cassandra 3.0, the system has a sandbox in place that should prevent execution of malicious code.

Default: false

enable_scripted_user_defined_functions: Enables scripted UDFs (JavaScript UDFs).

Java UDFs are always enabled if enable_user_defined_functions is true. Enable this option to use UDFs with "language javascript" or any custom JSR-223 provider. This option has no effect if enable_user_defined_functions is false.

Default: false

Memory leak detection settings

Configure garbage collection monitoring and memory leak detection thresholds.

The following properties are commented out in the default cassandra.yaml configuration and are not active by default. Uncomment and configure them as needed for your environment.

gc_log_threshold_in_ms: HCD logs GC pauses greater than this threshold at INFO level.

This threshold can be adjusted to minimize logging if necessary.

Default: 200 ms (commented out in default configuration)

gc_warn_threshold_in_ms: The system logs GC pauses greater than this threshold at WARN level.

Adjust the threshold based on your application throughput requirements. Setting to 0 deactivates the feature.

Default: 1000 ms (commented out in default configuration)

max_value_size_in_mb: Maximum size of any value in SSTables. Safety measure to detect SSTable corruption early.

Any value size larger than this threshold will result in HCD marking an SSTable as corrupted. This should be positive and less than 2048.

Default: 256 MB (commented out in default configuration)

Guardrails

Guardrails are system limits that ensure high availability and optimal performance of the database. They help prevent operations that could cause performance issues or system instability. For more information, see HCD guardrails.

emulate_dbaas_defaults

When enabled, modifies defaults to match those used by DataStax Constellation (DataStax cloud data platform), including stricter guardrails defaults.

This can be used as a convenience to develop and test applications meant to run on DataStax Constellation.

When enabled, the updated defaults reflect those of DataStax Constellation at the time of the currently used HCD release. This is a best-effort emulation of said defaults. All nodes must use the same config value.

Default: false

guardrails

Configure HCD system limits which ensure high availability and optimal performance of the database.

guardrails:
  # Tombstone thresholds
  tombstone_warn_threshold: 1000
  tombstone_failure_threshold: 100000

  # Partition size threshold
  partition_size_warn_threshold_in_mb: 100

  # Batch size thresholds
  batch_size_warn_threshold_in_kb: 64
  batch_size_fail_threshold_in_kb: 640
  unlogged_batch_across_partitions_warn_threshold: 10

  # Column and collection thresholds
  column_value_size_failure_threshold_in_kb: -1
  columns_per_table_failure_threshold: -1
  fields_per_udt_failure_threshold: -1
  collection_size_warn_threshold_in_kb: -1
  items_per_collection_warn_threshold: -1

  # Index thresholds
  secondary_index_per_table_failure_threshold: -1
  sai_indexes_per_table_failure_threshold: 10
  sai_indexes_total_failure_threshold: 100

  # View and table thresholds
  materialized_view_per_table_failure_threshold: -1
  tables_warn_threshold: -1
  tables_failure_threshold: -1

  # Query thresholds
  page_size_failure_threshold_in_kb: -1
  in_select_cartesian_product_failure_threshold: -1
  partition_keys_in_select_failure_threshold: -1

  # Disk usage thresholds
  disk_usage_percentage_warn_threshold: -1
  disk_usage_percentage_failure_threshold: -1
  disk_usage_max_disk_size_in_gb: -1

  # Operation controls
  read_before_write_list_operations_enabled: true
  user_timestamps_enabled: true

  # Table properties and consistency levels
  table_properties_disallowed:
  write_consistency_levels_disallowed:

Tombstone thresholds

tombstone_warn_threshold: Log a warning when scanning more tombstones than this threshold.

When executing a scan, within or across a partition, Cassandra keeps tombstones in memory to return them to the coordinator. With workloads that generate many tombstones, this can cause performance problems and even exhaust the server heap.

Default: 1000 (may differ if emulate_dbaas_defaults is enabled)

tombstone_failure_threshold: Fail queries that scan more tombstones than this threshold.

Default: 100000 (may differ if emulate_dbaas_defaults is enabled)

Partition size threshold

partition_size_warn_threshold_in_mb: Log a warning when compacting partitions larger than this value.

Default: 100 MB (may differ if emulate_dbaas_defaults is enabled)

Batch size thresholds

batch_size_warn_threshold_in_kb: Log WARN on any multiple-partition batch size that exceeds this value.

Use caution when increasing this threshold as it can lead to node instability.

Default: 64 KB (may differ if emulate_dbaas_defaults is enabled)

batch_size_fail_threshold_in_kb: Fail any multiple-partition batch that exceeds this value.

The calculated default is 640 KB (10x warn threshold).

Default: 640 KB (may differ if emulate_dbaas_defaults is enabled)

unlogged_batch_across_partitions_warn_threshold: Log WARN on any batches not of type LOGGED that span across more partitions than this limit.

Default: 10 (may differ if emulate_dbaas_defaults is enabled)

Column and collection thresholds

column_value_size_failure_threshold_in_kb: Failure threshold to prevent writing large column values into HCD.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

columns_per_table_failure_threshold: Failure threshold to prevent creating more columns per table than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

fields_per_udt_failure_threshold: Failure threshold to prevent creating more fields in user-defined-type than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

collection_size_warn_threshold_in_kb: Warning threshold to warn when encountering larger size of collection data than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

items_per_collection_warn_threshold: Warning threshold to warn when encountering more elements in collection than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

Index thresholds

secondary_index_per_table_failure_threshold: Failure threshold to prevent creating more secondary indexes per table than threshold. Does not apply to CUSTOM INDEX StorageAttachedIndex.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

sai_indexes_per_table_failure_threshold: Failure threshold for number of StorageAttachedIndex per table. Only applies to CUSTOM INDEX StorageAttachedIndex.

Default: 10 (same when emulate_dbaas_defaults is enabled)

sai_indexes_total_failure_threshold: Failure threshold for total number of StorageAttachedIndex across all keyspaces. Only applies to CUSTOM INDEX StorageAttachedIndex.

Default: 100 (same when emulate_dbaas_defaults is enabled)

View and table thresholds

materialized_view_per_table_failure_threshold: Failure threshold to prevent creating more materialized views per table than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

tables_warn_threshold: Warning threshold to warn when creating more tables than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

tables_failure_threshold: Failure threshold to prevent creating more tables than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

Query thresholds

page_size_failure_threshold_in_kb: Failure threshold to prevent providing larger paging by bytes than threshold, also served as a hard paging limit when paging by rows is used.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

in_select_cartesian_product_failure_threshold: Failure threshold to prevent IN query creating size of cartesian product exceeding threshold.

Example: "a in (1,2,…10) and b in (1,2…10)" results in cartesian product of 100.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

partition_keys_in_select_failure_threshold: Failure threshold to prevent IN query containing more partition keys than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

Disk usage thresholds

disk_usage_percentage_warn_threshold: Warning threshold to warn when local disk usage exceeds threshold. Valid values: 1, 100.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

disk_usage_percentage_failure_threshold: Failure threshold to reject write requests if replica disk usage exceeds threshold. Valid values: 1, 100.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

disk_usage_max_disk_size_in_gb: Allows configuring max disk size of data directories when calculating thresholds for disk_usage_percentage_warn_threshold and disk_usage_percentage_failure_threshold.

Valid values: (1, max available disk size of all data directories).

Default: -1 (disabled and use the physically available disk size of data directories during calculations, may differ if emulate_dbaas_defaults is enabled)

Operation controls

read_before_write_list_operations_enabled: Whether to allow read-before-write operations, such as setting list element by index or removing list element by index.

Note: Lightweight Transactions (LWT) is always allowed.

Default: true (may differ if emulate_dbaas_defaults is enabled)

user_timestamps_enabled: Whether to allow user-provided timestamps in write requests.

Default: true (may differ if emulate_dbaas_defaults is enabled)

Table properties and consistency levels

table_properties_disallowed: Prevents creating tables with provided configurations.

Default: All properties are allowed (may differ if emulate_dbaas_defaults is enabled)

write_consistency_levels_disallowed: Prevents queries with provided consistency levels.

Default: All consistency levels are allowed.

cassandra.yaml configuration file

Was this helpful?