cassandra.yaml configuration file

The cassandra.yaml file is the main configuration file for Hyper-Converged Database (HCD).

After changing properties in the cassandra.yaml file, you must restart the node for the changes to take effect.

Syntax

For the properties in each section, the parent setting has zero spaces. Each child entry requires at least two spaces. Adhere to the YAML syntax and retain the spacing.

  • The system shows undefined default values as Default: none.

  • The system describes internally defined default values.

    HCD can define default values internally, comment them out, or create implementation dependencies on other properties in the cassandra.yaml file. Additionally, some commented-out values may not match the actual default values. DataStax recommends the commented out values as alternatives to the default values.

Organization

The cassandra.yaml file groups configuration properties into the following sections:

Quick start properties

The minimal properties needed for configuring a cluster.

cluster_name

The name of the cluster. This setting prevents nodes in one logical cluster from joining another, so you must set it to a unique name other than the default. All nodes in a cluster must have the same value.

Default: 'Test Cluster'

rpc_address

The address that client applications connect to. You typically set this to a node’s public IP address that is routable from the clients.

If not changed from the default localhost, only applications deployed on the server will be able to connect to the node.

Default: localhost

listen_address

The IP address or hostname that the database binds to, exclusively for private communication between nodes in the cluster. You typically set this to a node’s private IP address that is routable from other nodes.

If not changed from the default localhost, the node will not be able to communicate with other nodes in the cluster.

Default: localhost

listen_interface

The interface that the database binds to for connecting to other nodes. Interfaces must correspond to a single address. HCD does not support IP aliasing.

Never set listen_address to 0.0.0.0.

Set listen_address or listen_interface, not both.

listen_interface_prefer_ipv6

Use IPv4 or IPv6 when an interface is specified by name.

  • false: Use the first IPv4 address.

  • true: Use the first IPv6 address.

    When you use only a single address, HCD selects that address without regard to this setting.

Default: false

Default directories

If you have changed any of the default directories during installation, set these properties to the new locations. Make sure you have root access.

data_file_directories

The directory where table data is stored on disk. The database distributes data evenly across the location, subject to the granularity of the configured compaction strategy.

For production, DataStax recommends RAID 0 and SSDs.

Default: - /var/lib/cassandra/data

commitlog_directory

The directory where HCD stores the commit log.

For optimal write performance, place the commit log on a separate disk partition, or ideally on a separate physical device, from the data directories. Because the commit log only appends data, a hard disk drive (HDD) works as long as it keeps up with the writes.

Default: $CASSANDRA_HOME/data/commitlog

The commitlog_directory and the cdc_raw_directory must reside on the same partition. Keep these directories in separate sub-folders that are not nested.

cdc_raw_directory

The directory where HCD stores change data capture (CDC) commit log segments on flush. DataStax recommends using a physical device that is separate from the data directories. See Change Data Capture (CDC) logging.

Default: $CASSANDRA_HOME/data/cdc_raw

The cdc_raw_directory and the commitlog_directory must reside on the same partition. Keep these directories in separate sub-folders that are not nested.

hints_directory

The directory where HCD stores hints (missed writes).

Default: $CASSANDRA_HOME/data/hints

metadata_directory

The directory that holds cluster metadata including information about the local node and its peers.

Default: $CASSANDRA_HOME/data/metadata

saved_caches_directory

The directory location where HCD stores table key and row caches.

Default: $CASSANDRA_HOME/data/saved_caches

Data directory configuration

Distributing data across multiple disks, also known as "JBOD configuration" (just a bunch of disks), maximizes throughput and ensures efficient disk I/O. HCD lets you specify multiple directories for storing your data in this distributed manner. DataStax recommends using striped LVM instead.

To configure a single data directory in the cassandra.yaml file:

data_file_directories:
     - /var/lib/cassandra/data

For multiple data directories:

data_file_directories:
     - /disk1/datadir
     - /disk2/datadir
     - /disk3/datadir

Commonly used properties

Properties most frequently used when configuring HCD.

Before starting a node for the first time, DataStax recommends that you carefully evaluate your requirements.

Common initialization properties

Be sure to set the properties in the Quick start section as well.

commit_failure_policy

Determines how HCD handles commit log disk failures.

  • die: Shut down the node and kill the JVM, so the node can be replaced.

  • stop: Shut down the node, leaving the node effectively dead, available for inspection using JMX.

  • stop_commit: Shut down the commit log, let writes collect, but continue to service reads.

  • ignore: Ignore fatal errors and let the batches fail.

Default: stop (recommended)

disk_optimization_strategy

Reading from spinning disks is slow so the system buffers them with an extra page of 4KB just in case. This is unnecessary for SSDs so the system buffers only what is required.

  • ssd: Data directory backed by solid state disks

  • spinning: Data directory backed by spinning disks

Default: ssd

disk_failure_policy

Determines how HCD handles disk failures.

  • die: Shut down gossip and client transports, and kill the JVM for any file system errors or single SSTable errors. Enables you to replace the node.

  • stop_paranoid: Shut down the node, even for single SSTable errors.

  • stop: Shut down the node leaving the node effectively dead, but the JVM is still available for inspection using JMX.

  • best_effort: Stop using the failed disk and respond to requests based on the remaining available SSTables. This setting allows obsolete data at consistency level of ONE.

  • ignore: Ignore fatal errors and let the requests fail; all file system errors are logged but otherwise ignored.

Recommended policies are stop and best_effort.

+ Default: stop

endpoint_snitch

Configure this property to set the snitch. The most common snitches are:

  • Default: SimpleSnitch + Uses replication strategy order for proximity. This snitch does not recognize racks or datacenters, and considers all nodes as belonging to one ring (single DC) making it incompatible with multi-DC deployments and unsuitable for production environments. + This snitch is appropriate for development environments only.

  • GossipingPropertyFileSnitch (GPFS) + Uses rack and datacenter information for the local node defined in the cassandra-rackdc.properties file and propagates this information to other nodes via gossip. + This snitch is recommended for production environments and is almost always the correct choice.

  • PropertyFileSnitch (PFS) + Determines node proximity using the rack and datacenter location defined in the cassandra-topology.properties file. GPFS supersedes this snitch. Only use PFS for backwards compatibility.

    For other snitches such as Ec2Snitch and GoogleCloudSnitch, see About snitches.

All nodes in a cluster must use the same snitch.

HCD determines replica placement (which defines where copies of data are stored) using the information provided by the snitch. Changing the snitch has implications for where the data is located so requires additional steps and should only be performed by experienced operators.

seed_provider

The gossip seed provider and corresponding addresses of nodes that are designated as contact points in the cluster. A joining node contacts the nodes in the seeds list and establishes a connection to the first available node to discover the members of the cluster and topology.

  • class_name: The class that handles the seed logic. HCD uses the default in almost all clusters. However, you can substitute a custom seed provider in limited edge cases. + Default: org.apache.cassandra.locator.SimpleSeedProvider

  • seeds: A comma delimited list of addresses and their corresponding storage_port. A new node joining the cluster uses the list to bootstrap the gossip process. If the cluster has multiple nodes, the default value must be changed to the IP address and gossip port of one of the nodes.

    Default: "127.0.0.1:7000"

    Making every node a seed node is not recommended because of increased maintenance and reduced gossip performance. Gossip optimization is not critical, but it is recommended to use a small seed list (approximately three nodes per datacenter).

Advanced initialization properties

allocate_tokens_for_keyspace

Triggers the algorithm which allocates optimum num_tokens tokens such that token ranges are spread evenly across nodes meaning data is distributed more evenly compared to the legacy random allocation. Only supported on clusters using Murmur3Partitioner.

The replication strategy of the specified keyspace is used by the algorithm for optimizing token allocation when new nodes join a cluster.

The property allocate_tokens_for_local_replication_factor is preferred over allocate_tokens_for_keyspace, particularly when adding nodes in a new datacenter where a keyspace is not yet replicated. If neither property is set, defaults to legacy behaviour where tokens are allocated randomly.

allocate_tokens_for_local_replication_factor

Triggers the algorithm which allocates optimum num_tokens tokens such that token ranges are spread evenly across nodes meaning data is distributed more evenly compared to the legacy random allocation. Only supported on clusters using Murmur3Partitioner.

Specify the replication factor in the local datacenter, 3 for example, that the algorithm uses to optimize token allocation when new nodes join a cluster.

allocate_tokens_for_local_replication_factor is preferred over allocate_tokens_for_keyspace because it does not require the replication of a keyspace to be defined. This is especially helpful when adding nodes in a new datacenter. If neither property is set, defaults to legacy behaviour where tokens are allocated randomly.

auto_bootstrap

When joining a cluster for the first time, this property determines whether the node will request replicas to stream data. This is the default behavior. If the node is defined as a seed, it immediately joins the cluster without data.

Non-seed nodes will bootstrap automatically by default. Set to false when adding nodes in a new datacenter where bootstrap is manually triggered by an operator with the nodetool rebuild command.

Default: true

broadcast_address

Set to the node’s public IP address in environments where nodes are only able to communicate across networks using their public IP adresses such as multi-region Amazon EC2 deployments. Otherwise, the node will broadcast on the same address as listen_address.

Set a separate listen_address and broadcast_address on a node with multiple network interfaces or where nodes are not able to communicate over private IP addresses. Not required in environments that support automatic switching between private and public communication.

Default: uses value of listen_address

initial_token

The property for manually assigning tokens for ranges to be owned by the node.

Specify one token value for legacy single-token clusters. For clusters with virtual nodes enabled, specify multiple tokens as a comma-separated list.

When setting initial_token, the corresponding num_tokens must also be set.

Default: not set in preference for num_tokens

listen_on_broadcast_address

Set to true on nodes with multiple interfaces to enable communication on both listen_address and broadcast_address.

Default: false

num_tokens

Defines the number of tokens to assign to the node.

Early versions of Cassandra used a default value of 256 tokens for clusters with virtual nodes enabled. This setting shares data with more peers and offers the least variance in data size among nodes in the same datacenter, but might lead to decreased availability in the event of node outages.

Lesser token counts such as 4 or 8 have a higher availability but also higher variance in data size. 16 tokens achieves a good distribution of data without compromising too much on availability.

Default: 1 token when not set

partitioner

The partitioner determines how data is distributed across the nodes in the cluster.

The default Murmur3Partitioner is the correct and only choice for new clusters. The legacy partitioners provide backward-compatibility with existing clusters upgraded from older versions of Cassandra, because the partitioner can never be changed on a running cluster.

Default: org.apache.cassandra.dht.Murmur3Partitioner

Common compaction settings

compaction_throughput_mb_per_sec

The rate in megabytes/second at which HCD compacts SSTable candidates. The faster the database inserts data, the faster HCD must compact in order to keep the number of SSTables down.

Set to 16 to 32 times the write throughput in MB/second. Otherwise, set to 0 to disable compaction throttling. A high setting means that HCD uses more disk I/O for compaction, leaving less I/O bandwidth for reads.

Default: 64

Memtable settings

When a node receives a write request, HCD stores the data in a memory structure called a memtable and appends it to the commit log on disk for durability (see How data is written). HCD can allocate memtable segments either on- or off-heap.

memtable_allocation_type

Determines how HCD allocates memory to the memtable.

  • heap_buffers: The system allocates memtables on JVM heap. Suitable for general workloads where heap memory is sufficient.

  • offheap_buffers: Uses Java NIO direct buffers to store cell names and values off-heap. This allocation type reduces heap utilization significantly, leading to reduced GC pressure.

  • offheap_objects: The system allocates memtables completely off-heap, directly in native memory. This allocation type is recommended particularly for clusters that handle large datasets. Writes are around 5% faster mostly due to memtables flushing less often.

  • unslabbed_heap_buffers: The system allocates memtables on a JVM heap without using a slab allocator. This can lead to increased heap fragmentation. DataStax does not recommend this option for any environments.

Default: offheap_objects

memtable_heap_space_in_mb

The maximum amount of memory to allocate for memtables on JVM heap. When the threshold is reached, the system blocks writes until a flush completes. The system triggers a flush of the largest memtable based on memtable_cleanup_threshold.

Default: ¼ of heap

memtable_offheap_space_in_mb

The maximum amount of memory to allocate for memtables from native memory. When the threshold is reached, HCD blocks writes until a flush completes. HCD triggers a flush of the largest memtable based on memtable_cleanup_threshold.

Default: ¼ of heap

memtable_cleanup_threshold

The threshold that triggers a flush based on the ratio of memtable size to the maximum memory size permitted for memtables.

The system deprecates setting a value since the default calculation is the only reasonable choice.

Default: 1 / (memtable_flush_writers + 1)

memtable_flush_writers

The total number of memtables that can be flushed concurrently as well as the number of flush writer threads per disk.

A single thread is generally capable of keeping up with ingesting writes on a node with a single fast disk unless it becomes IO-bound temporarily so two flush writers are usually sufficient. If flushing is falling behind (MemtablePool.BlockedOnAllocation metric is greater than 0), increment the number of flush writers.

Note that more writers can lead to more frequent flushes and smaller SSTables which puts pressure on compactions.

Default: 2 for nodes with a single data directory, otherwise 1 per memtable

Common automatic backup settings

HCD does not automatically clear backups and snapshots so that disk usage can grow unbounded. When the disk gets full, HCD automatically shuts down by default when it can no longer write files to disk.

DataStax recommends setting up a process to clear incremental backups each time a new snapshot is created.

auto_snapshot

When enabled (set to true), a snapshot is taken before DROP KEYPACE, DROP TABLE, or TRUNCATE TABLE is executed.

DataStax strongly recommends enabling auto snapshot as a precaution, in case someone executes the DROP or TRUNCATE commands accidentally against the wrong keyspace or table.

Default: true

incremental_backups

When set to true, HCD creates hard links to each SSTable that has been flushed or streamed in the backups/ subdirectory of the keyspace data.

Default: false

snapshot_before_compaction

When set to true, HCD takes a snapshot before each compaction task. You may use the snapshot as a rollback position in an upgrade. Usage is limited since the general recommendation is to take backups before performing an upgrade.

Use with extreme caution as disk usage can grow exponentially.

Default: false

Performance tuning

Tuning performance and system resource utilization, including commit log, compaction, memory, disk I/O, CPU, reads, and writes.

Performing tuning properties include:

Commit log settings

commitlog_sync

Defines the mode by which the commit log is synchronized to disk. When the data is considered fully persisted to storage, the data will survive a system crash or power outage. The sync mode also determines when HCD sends a successful write acknowledgement to the coordinator.

  • batch: Each write request triggers a call to sync immediately. The acknowledgement is blocked until the after the commit log has been flushed to disk. Prioritizes durability over performance.

  • group: Similar to batch mode but waits up to commitlog_sync_group_window_in_ms between flushes so more writes are persisted together. The system also blocks the acknowledgement until after the commit log has been flushed to disk. Recommended over batch mode.

  • periodic: The system synchronizes the commit log every commitlog_sync_period_in_ms, but the write is acknowledged immediately. Prioritizes performance over durability.

Default: periodic

commitlog_sync_period_in_ms

Time interval between commit log syncs to disk. Only set with periodic sync mode, otherwise an exception will be logged.

Default: 10000 (10 seconds)

commitlog_sync_group_window_in_ms

The minimum interval between disk syncs. Only set with group sync mode, otherwise an exception will be logged.

Default: 1000 (1 second)

commitlog_sync_batch_window_in_ms

Deprecated. The maximum delay between disk syncs. No longer used.

commitlog_segment_size_in_mb

The size of individual commit log file segments. A small size means more frequent flushes leading to small SSTables which put pressure on compaction.

If you use the commit log archives for point-in-time recovery, it is reasonable to reduce the size to 16 or 8MB for finer granularity. However, be aware that the maximum mutation size is dependent on the segment size.

Default: 32

max_mutation_size_in_kb

The maximum allowed size of a mutation (the payload size of a write request) which defaults to half of the commit log segment (commitlog_segment_size_in_mb). If explicitly set, you must set commitlog_segment_size_in_mb to at least twice the value of max_mutation_size_in_kb.

Before increasing the commitlog segment size of the commitlog segments, investigate why the mutations are larger than expected. Look for underlying issues with access patterns and data model, because increasing the commitlog segment size is a limited fix.

Default: ½ of commitlog_segment_size_in_mb

commitlog_total_space_in_mb

The maximum disk space for commit logs on disk.

If the limit is reached, HCD flushes the oldest commit log segments to reclaim disk space. A small size means more frequent flushes on less-active tables leading to small SSTables which put pressure on compaction.

Default: smaller of 8192 or ¼ of commitlog/ disk

commitlog_compression

By default, the commit log is not compressed. To enable compression, specify the compression library to use.

The supported libraries are:

  • DeflateCompressor: Legacy option that is the slowest compared to newer algorithms. This option is not recommended.

  • LZ4Compressor: Fastest algorithm, but offers less compression ratios. Choose when speed is preferred over space savings.

  • SnappyCompressor: Not as fast as LZ4, but provides better compression.

  • ZstdCompressor: Provides the best compression ratio, but is slower than other algorithms.

    commitlog_compression:
      - class_name: LZ4Compressor

Change Data Capture (CDC) settings

See also cdc_raw_directory.

cdc_enabled

Enables CDC functionality on a per-node basis when set to true.

Default: false

cdc_total_space_in_mb

Maximum disk space to use for CDC logs. If the limit is reached, HCD throws WriteTimeoutException on mutations, including CDC-enabled tables. A CDCCompactor (a consumer) parses the raw CDC logs and deletes them when parsing is completed.

Default: smaller of 4096 MB or 1/8th of cdc_raw_directory disk

cdc_free_space_check_interval_ms

Interval between disk space checks when cdc_total_space_in_mb limit is reached.

Default: 250

Compaction settings

concurrent_compactors

The number of compaction threads allowed to run simultaneously. Simultaneous compactions help preserve read performance in a mixed read-write workload by limiting the number of small SSTables that accumulate during a single long-running compaction.

Generally, the calculated default value is appropriate and does not need adjusting. DataStax recommends contacting DataStax Support before changing this value. If your data directories are backed by SSDs, increase this value to the number of cores.

If compaction runs too slowly or too fast, adjust the compaction_throughput_mb_per_sec option in the Common compaction settings section.

Increasing concurrent compactors leads to more use of available disk space for compaction, because concurrent compactions happen in parallel, especially for STCS. Ensure that adequate disk space is available before increasing this configuration.

Default: fewer of number of data disks or CPU cores, with a minimum of 2 and a maximum of 8

concurrent_validations

The number of repair validation threads allowed to run simultaneously.

Defaults to the value of concurrent_compactors if not configured or set to ⇐ 0. Requires system property -Dcassandra.allow_unlimited_concurrent_validations=true to set validation threads to a value higher than concurrent compactors.

Default: the value of concurrent_compactors

concurrent_materialized_view_builders

The number of view builder tasks allowed to run simultaneously if materialized views are enabled. This is experimental.

When a view is created, the node ranges are split into [num_processors x 4] builder tasks. Set this property to 2 or higher to build views faster.

Default: 1

sstable_preemptive_open_interval_in_mb

The size of the SSTable candidates to trigger preemptive opening of compaction output.

The compaction process opens SSTables before the system completely writes them and uses them in place of the prior SSTables for any range previously written. Preemptive opening of SSTables helps to smoothly transfer reads between the SSTables by reducing cache churn and keeps hot rows hot.

A low value has a negative performance impact and will eventually cause heap pressure and GC activity. The optimal value depends on hardware and workload.

Default: 50

Cache and index settings

column_index_size_in_kb

Granularity of the index of rows within a partition. For huge rows, decrease this setting to improve seek time.

Default: 64

file_cache_size_in_mb

Maximum memory to use for caching SSTable chunks and buffer pools. Allocated from native memory in addition to heap.

Default: smaller of 2048 or ¼ of heap

Streaming settings

These settings apply to operations that perform file streaming, including repairs, bootstraps, and decommissions. These operations are mostly sequential I/O which can saturate a node’s network bandwidth and degrade client (application) performance. To fix this, you must throttle streaming throughput.

inter_dc_stream_throughput_outbound_megabits_per_sec

Maximum network bandwidth for streaming file transfers between datacenters. Set to a value less or equal to stream_throughput_outbound_megabits_per_sec.

Default: 200 Mbps (25 MB/s)

stream_entire_sstables

Enables the Zero Copy Streaming feature where eligible SSTables are streamed in their entirety between nodes instead of individual partitions, transferring data at a significantly faster rate.

This feature is bound to the streaming throughput limits and disabled when internode encryption is enabled.

Default: true

stream_throughput_outbound_megabits_per_sec

Maximum network bandwidth permitted for all outbound file transfers streaming on a node.

Default: 200 Mbps (25 MB/s)

streaming_keep_alive_period_in_secs

Interval to send keep-alive messages to prevent reset connections during streaming. The streaming session fails when the system does not receive a keep-alive message for two keep-alive cycles equivalent to 10 minutes by default (2 x 300 seconds).

Default: 300

Advanced properties

Less commonly-used settings normally reserved for experienced operators.

max_value_size_in_mb

The maximum size of any value in SSTables up to a maximum of 2048 MB. If any value exceeds this threshold, HCD marks the SSTables as corrupted.

Default size is the same as the default native protocol frame limit native_transport_max_frame_size_in_mb.

Default: 256

trickle_fsync

Enables flushing portions of SSTables written using sequential writers when trickle_fsync_interval_in_kb is reached. This minimizes sudden flushing of dirty buffers, which can impact read latencies.

Recommended for use with SSDs which can handle more frequent calls to fsync(), but may be detrimental to slow HDDs.

Default: true

trickle_fsync_interval_in_kb

Threshold to trigger a flush when trickle_fsync is enabled.

Default: 10240 (10 MB)

Security properties

Configure authentication, authorization, and role management.

The security properties in cassandra.yaml control how HCD handles user authentication, authorization, and data encryption. These settings are crucial for securing your cluster in production environments.

Authentication properties

authenticator

The authentication backend that implements IAuthenticator to identify users.

HCD provides several authentication options:

  • AllowAllAuthenticator: Performs no authentication checks. Use this to disable authentication. DataStax does not recommend this for production environments.

  • PasswordAuthenticator: Relies on username/password pairs stored in the system_auth.roles table.

  • AdvancedAuthenticator: Allows multiple authentication schemes including internal, OIDC, and LDAP.

    Default: AllowAllAuthenticator

    If using PasswordAuthenticator, you must also use CassandraRoleManager for role management. Increase the system_auth keyspace replication factor when using authentication.

Authorization properties

authorizer

The authorization backend that implements IAuthorizer to limit access and provide permissions.

HCD provides several authorization options:

  • AllowAllAuthorizer: Allows any action to any user. Use this to disable authorization. DataStax does not recommend this for production environments.

  • CassandraAuthorizer: Stores permissions in the system_auth.role_permissions table.

  • AdvancedAuthorizer: Checks if roles have authorization permissions to access resources.

    Default: AllowAllAuthorizer

    If using CassandraAuthorizer, increase the system_auth keyspace replication factor.

Role management properties

role_manager

The role management backend that implements IRoleManager to maintain grants and memberships between roles.

HCD provides several role management options:

  • CassandraRoleManager: Stores role data in the system_auth keyspace.

  • AdvancedRoleManager: Fetches roles from internal Cassandra tables and/or external servers.

    Default: CassandraRoleManager

    Most IRoleManager functions require an authenticated login. If the configured IAuthenticator doesn’t implement authentication, most functionality will be unavailable.

Network authorization properties

network_authorizer

The network authorization backend that implements INetworkAuthorizer to restrict user access to certain datacenters.

HCD provides several network authorization options:

  • AllowAllNetworkAuthorizer: Allows access to any datacenter to any user.

  • CassandraNetworkAuthorizer: Stores permissions in the system_auth.network_permissions table.

    Default: AllowAllNetworkAuthorizer

Client encryption properties

client_encryption_options

Configure client-to-server encryption settings.

client_encryption_options:
    enabled: false                    # Enable client-to-server encryption
    optional: true                    # Allow encrypted and unencrypted connections
    keystore: conf/.keystore         # Path to keystore file
    keystore_password: cassandra     # Keystore password
    require_client_auth: false        # Verify client certificates
    truststore: conf/.truststore     # Path to truststore file
    truststore_password: cassandra   # Truststore password
    protocol: TLS                     # SSL/TLS protocol
    store_type: JKS                   # Keystore type
    cipher_suites: [                 # Supported cipher suites
        TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
        TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
        TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
    ]

The default configuration is insecure. Generate proper keystores and truststores before enabling encryption in production.

Server encryption properties

server_encryption_options

Configure server-to-server internode encryption settings.

server_encryption_options:
    internode_encryption: none        # Encryption scope: none, dc, rack, or all
    optional: true                    # Allow encrypted and unencrypted connections
    keystore: conf/.keystore         # Path to keystore file
    keystore_password: cassandra     # Keystore password
    require_client_auth: false        # Verify peer server certificates
    truststore: conf/.truststore     # Path to truststore file
    truststore_password: cassandra   # Truststore password
    require_endpoint_verification: false  # Verify hostname in certificate
    enable_legacy_ssl_storage_port: false # Enable legacy SSL storage port

Encryption scope options:

  • none: Do not encrypt outgoing connections

  • dc: Encrypt connections to peers in other datacenters, but not within datacenters

  • rack: Encrypt connections to peers in other racks, but not within racks

  • all: Always use encrypted connections

    The default configuration is insecure. Generate proper keystores and truststores before enabling encryption in production.

Transparent data encryption properties

transparent_data_encryption_options

Configure transparent data encryption (TDE) for data at rest.

transparent_data_encryption_options:
    enabled: false                    # Enable transparent data encryption
    chunk_length_kb: 64              # Encryption chunk size
    cipher: AES/CBC/PKCS5Padding     # Encryption cipher
    key_alias: testing:1             # Key alias for encryption
    iv_length: 16                    # CBC IV length for AES
    key_provider:
      - class_name: org.apache.cassandra.security.JKSKeyProvider
        parameters:
          - keystore: conf/.keystore
            keystore_password: cassandra
            store_type: JCEKS
            key_password: cassandra

Install Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files for your JDK version before enabling TDE. Currently supports encryption for commitlog and hints files.

Audit logging properties

audit_logging_options

Configure audit logging to track CQL commands and authentication events.

audit_logging_options:
    enabled: false                    # Enable audit logging
    logger:
      - class_name: BinAuditLogger   # Audit logger implementation
    audit_logs_dir:                  # Directory for audit logs
    included_keyspaces:              # Keyspaces to audit
    excluded_keyspaces: system, system_schema, system_virtual_schema
    included_categories:             # Categories to audit
    excluded_categories:             # Categories to exclude
    included_users:                  # Users to audit
    excluded_users:                  # Users to exclude
    roll_cycle: HOURLY              # Log roll cycle
    block: true                      # Block on log write
    max_queue_weight: 268435456     # Max queue weight (256 MiB)
    max_log_size: 17179869184       # Max log size (16 GiB)
    archive_command:                 # Archive command
    max_archive_retries: 10         # Max archive retries

Security cache properties

roles_validity_in_ms

Validity period for roles cache in milliseconds.

HCD caches granted roles for authenticated sessions. After this period, they become eligible for async reload.

Default: 2000

Set to 0 to disable caching entirely.

permissions_validity_in_ms

Validity period for permissions cache in milliseconds.

Default: 2000

Set to 0 to disable caching.

credentials_validity_in_ms

Validity period for credentials cache in milliseconds.

The system tightly couples this cache to the PasswordAuthenticator implementation.

Default: 2000

Set to 0 to disable caching.

roles_update_interval_in_ms

Refresh interval for roles cache in milliseconds.

After this interval, cache entries become eligible for refresh.

Default: Same value as roles_validity_in_ms

permissions_update_interval_in_ms

Refresh interval for permissions cache in milliseconds.

After this interval, cache entries become eligible for refresh.

Default: Same value as permissions_validity_in_ms

credentials_update_interval_in_ms

Refresh interval for credentials cache in milliseconds.

After this interval, cache entries become eligible for refresh.

Default: Same value as credentials_validity_in_ms

User-defined functions (UDF)

Configure user-defined functions (UDFs) that allow custom logic to be executed within the database.

enable_user_defined_functions

Enables user-defined functions (UDFs) on this node.

As of Cassandra 3.0, the system has a sandbox in place that should prevent execution of malicious code.

Default: false

enable_scripted_user_defined_functions

Enables scripted UDFs (JavaScript UDFs).

Java UDFs are always enabled if enable_user_defined_functions is true. Enable this option to use UDFs with "language javascript" or any custom JSR-223 provider. This option has no effect if enable_user_defined_functions is false.

Default: false

Memory leak detection settings

Configure garbage collection monitoring and memory leak detection thresholds.

The following properties are commented out in the default cassandra.yaml configuration and are not active by default. Uncomment and configure them as needed for your environment.

gc_log_threshold_in_ms

HCD logs GC pauses greater than this threshold at INFO level.

This threshold can be adjusted to minimize logging if necessary.

Default: 200 ms (commented out in default configuration)

gc_warn_threshold_in_ms

The system logs GC pauses greater than this threshold at WARN level.

Adjust the threshold based on your application throughput requirements. Setting to 0 deactivates the feature.

Default: 1000 ms (commented out in default configuration)

max_value_size_in_mb

Maximum size of any value in SSTables. Safety measure to detect SSTable corruption early.

Any value size larger than this threshold will result in HCD marking an SSTable as corrupted. This should be positive and less than 2048.

Default: 256 MB (commented out in default configuration)

Guardrails

Guardrails are system limits that ensure high availability and optimal performance of the database. They help prevent operations that could cause performance issues or system instability. For more information, see HCD guardrails.

emulate_dbaas_defaults

When enabled, modifies defaults to match those used by DataStax Constellation (DataStax cloud data platform), including stricter guardrails defaults.

This can be used as a convenience to develop and test applications meant to run on DataStax Constellation.

When enabled, the updated defaults reflect those of DataStax Constellation at the time of the currently used HCD release. This is a best-effort emulation of said defaults. All nodes must use the same config value.

Default: false

guardrails

Configure HCD system limits which ensure high availability and optimal performance of the database.

guardrails:
  # Tombstone thresholds
  tombstone_warn_threshold: 1000
  tombstone_failure_threshold: 100000

  # Partition size threshold
  partition_size_warn_threshold_in_mb: 100

  # Batch size thresholds
  batch_size_warn_threshold_in_kb: 64
  batch_size_fail_threshold_in_kb: 640
  unlogged_batch_across_partitions_warn_threshold: 10

  # Column and collection thresholds
  column_value_size_failure_threshold_in_kb: -1
  columns_per_table_failure_threshold: -1
  fields_per_udt_failure_threshold: -1
  collection_size_warn_threshold_in_kb: -1
  items_per_collection_warn_threshold: -1

  # Index thresholds
  secondary_index_per_table_failure_threshold: -1
  sai_indexes_per_table_failure_threshold: 10
  sai_indexes_total_failure_threshold: 100

  # View and table thresholds
  materialized_view_per_table_failure_threshold: -1
  tables_warn_threshold: -1
  tables_failure_threshold: -1

  # Query thresholds
  page_size_failure_threshold_in_kb: -1
  in_select_cartesian_product_failure_threshold: -1
  partition_keys_in_select_failure_threshold: -1

  # Disk usage thresholds
  disk_usage_percentage_warn_threshold: -1
  disk_usage_percentage_failure_threshold: -1
  disk_usage_max_disk_size_in_gb: -1

  # Operation controls
  read_before_write_list_operations_enabled: true
  user_timestamps_enabled: true

  # Table properties and consistency levels
  table_properties_disallowed:
  write_consistency_levels_disallowed:

Tombstone thresholds

tombstone_warn_threshold

Log a warning when scanning more tombstones than this threshold.

When executing a scan, within or across a partition, Cassandra keeps tombstones in memory to return them to the coordinator. With workloads that generate many tombstones, this can cause performance problems and even exhaust the server heap.

Default: 1000 (may differ if emulate_dbaas_defaults is enabled)

tombstone_failure_threshold

Fail queries that scan more tombstones than this threshold.

Default: 100000 (may differ if emulate_dbaas_defaults is enabled)

Partition size threshold

partition_size_warn_threshold_in_mb

Log a warning when compacting partitions larger than this value.

Default: 100 MB (may differ if emulate_dbaas_defaults is enabled)

Batch size thresholds

batch_size_warn_threshold_in_kb

Log WARN on any multiple-partition batch size that exceeds this value.

Use caution when increasing this threshold as it can lead to node instability.

Default: 64 KB (may differ if emulate_dbaas_defaults is enabled)

batch_size_fail_threshold_in_kb

Fail any multiple-partition batch that exceeds this value.

The calculated default is 640 KB (10x warn threshold).

Default: 640 KB (may differ if emulate_dbaas_defaults is enabled)

unlogged_batch_across_partitions_warn_threshold

Log WARN on any batches not of type LOGGED that span across more partitions than this limit.

Default: 10 (may differ if emulate_dbaas_defaults is enabled)

Column and collection thresholds

column_value_size_failure_threshold_in_kb

Failure threshold to prevent writing large column values into HCD.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

columns_per_table_failure_threshold

Failure threshold to prevent creating more columns per table than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

fields_per_udt_failure_threshold

Failure threshold to prevent creating more fields in user-defined-type than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

collection_size_warn_threshold_in_kb

Warning threshold to warn when encountering larger size of collection data than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

items_per_collection_warn_threshold

Warning threshold to warn when encountering more elements in collection than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

Index thresholds

secondary_index_per_table_failure_threshold

Failure threshold to prevent creating more secondary indexes per table than threshold. Does not apply to CUSTOM INDEX StorageAttachedIndex.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

sai_indexes_per_table_failure_threshold

Failure threshold for number of StorageAttachedIndex per table. Only applies to CUSTOM INDEX StorageAttachedIndex.

Default: 10 (same when emulate_dbaas_defaults is enabled)

sai_indexes_total_failure_threshold

Failure threshold for total number of StorageAttachedIndex across all keyspaces. Only applies to CUSTOM INDEX StorageAttachedIndex.

Default: 100 (same when emulate_dbaas_defaults is enabled)

View and table thresholds

materialized_view_per_table_failure_threshold

Failure threshold to prevent creating more materialized views per table than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

tables_warn_threshold

Warning threshold to warn when creating more tables than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

tables_failure_threshold

Failure threshold to prevent creating more tables than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

Query thresholds

page_size_failure_threshold_in_kb

Failure threshold to prevent providing larger paging by bytes than threshold, also served as a hard paging limit when paging by rows is used.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

in_select_cartesian_product_failure_threshold

Failure threshold to prevent IN query creating size of cartesian product exceeding threshold.

Example: "a in (1,2,…​10) and b in (1,2…​10)" results in cartesian product of 100.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

partition_keys_in_select_failure_threshold

Failure threshold to prevent IN query containing more partition keys than threshold.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

Disk usage thresholds

disk_usage_percentage_warn_threshold

Warning threshold to warn when local disk usage exceeds threshold. Valid values: 1, 100.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

disk_usage_percentage_failure_threshold

Failure threshold to reject write requests if replica disk usage exceeds threshold. Valid values: 1, 100.

Default: -1 (disabled, may differ if emulate_dbaas_defaults is enabled)

disk_usage_max_disk_size_in_gb

Allows configuring max disk size of data directories when calculating thresholds for disk_usage_percentage_warn_threshold and disk_usage_percentage_failure_threshold.

Valid values: (1, max available disk size of all data directories).

Default: -1 (disabled and use the physically available disk size of data directories during calculations, may differ if emulate_dbaas_defaults is enabled)

Operation controls

read_before_write_list_operations_enabled

Whether to allow read-before-write operations, such as setting list element by index or removing list element by index.

Note: Lightweight Transactions (LWT) is always allowed.

Default: true (may differ if emulate_dbaas_defaults is enabled)

user_timestamps_enabled

Whether to allow user-provided timestamps in write requests.

Default: true (may differ if emulate_dbaas_defaults is enabled)

Table properties and consistency levels

table_properties_disallowed

Prevents creating tables with provided configurations.

Default: All properties are allowed (may differ if emulate_dbaas_defaults is enabled)

write_consistency_levels_disallowed

Prevents queries with provided consistency levels.

Default: All consistency levels are allowed.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com