DataStax Enterprise configuration file (dse.yaml)

dse.yaml is the primary DataStax Enterprise configuration file.

The dse.yaml file is the primary configuration file for DataStax Enterprise. The location of the dse.yaml file depends on the type of installation:
Installer-Services /etc/dse/dse.yaml
Package installations /etc/dse/dse.yaml
Installer-No Services install_location/resources/dse/conf/dse.yaml
Tarball installations install_location/resources/dse/conf/dse.yaml
For cassandra.yaml configuration, see Node and cluster configuration (cassandra.yaml).

DSE In-Memory options 

max_memory_to_lock_fraction: 0.20
# max_memory_to_lock_mb: 10240

To use the DSE In-Memory, choose one of these options to specify how much system memory to use for all in-memory tables.

Specify a fraction of the system memory. The default value of 0.20 specifies to use up to 20% of system memory.
Specify a maximum amount of memory in MB.

Hive meta store 

hive_meta_store_enabled: true
Enables or disables the Hive meta store via Cassandra. Default: true.

Kerberos options 

Use these options for configuring security for a DataStax Enterprise cluster using Kerberos. For instructions, see Authenticating a cluster with Kerberos.

   keytab: path_to_keytab/dse.keytab
   service_principal: dse_user/_HOST@REALM
   http_principal: HTTP/_HOST@REALM
   qop: auth
  • keytab: resources/dse/conf/dse.keytab

    The keytab file must contain the credentials for both of the fully resolved principal names, which replace _HOST with the fully qualified domain name (FQDN) of the host in the service_principal and http_principal settings. The UNIX user running DSE must also have read permissions on the keytab.

  • service_principal: dse_user/_HOST@REALM

    The service_principal that the Cassandra and Hadoop processes run under must use the form dse_user/_HOST@REALM, where dse_user is:

    • Installer-Services and Package installations: cassandra
    • Package installations: the name of the UNIX user that starts the service
    • _HOST is converted to a reverse DNS lookup of the broadcast address.
    • REALM is the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase.
    Set The service_principal must be consistent everywhere: in the dse.yaml file, present in the keytab, and in the cqlshrc file (where service_principal is separated into service/hostname).
  • http_principal: HTTP/_HOST@REALM

    The http_principal is used by the Tomcat application container to run DSE Search. The Tomcat web server uses GSS-API mechanism (SPNEGO) to negotiate the GSSAPI security mechanism (Kerberos). Set REALM to the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase.

  • qop - auth

    A comma-delimited list of Quality of Protection (QOP) values that clients and servers can use for each connection. The client can have multiple QOP values, while the server can have only a single QOP value. The valid values are:
    • auth - Default: Authentication only.
    • auth-int - Authentication plus integrity protection for all transmitted data.
    • auth-conf - Authentication plus integrity protection and encryption of all transmitted data.

      Encryption using auth-conf is separate and independent of whether encryption is done using SSL. If both auth-conf and SSL are enabled, the transmitted data is encrypted twice. DataStax recommends choosing only one method and using it for both encryption and authentication.

LDAP options 

To use these LDAP options, you must set com.datastax.bdp.cassandra.auth.LdapAuthenticator as the authenticator in the cassandra.yaml file. For instructions, see Authenticating a cluster with LDAP.
    server_host: localhost
    server_port: 389
    search_dn: cn=Admin
    search_password: secret
    use_ssl: false
    use_tls: false
    truststore_type: jks
    user_search_base: ou=users,dc=example,dc=com
    user_search_filter: (uid={0})
    credentials_validity_in_ms: 0
        max_active: 8
        max_idle: 8
The host name of the LDAP server. Default: localhost
The port on which the LDAP server listens. Default: 389
The username of the user that is used to search for other users on the LDAP server.
The password of the search_dn user.
Set to true to enable SSL connections to the LDAP server. If set to true, you might need to change server_port to the SSL port of the LDAP server. Default: false
Set to true to enable TLS connections to the LDAP server. If set to true, you might need to change the server_port to the TLS port of the LDAP server. Default: false
The path to the trust store for SSL certificates.
The password to access the trust store.
The type of trust store. Default: jks
The search base for your domain, used to look up users. Set the ou and dc elements for your LDAP domain. Typically this is set to ou=users,dc=domain,dc=top_level_domain. For example, ou=users,dc=example,dc=com.
The search filter for looking up user names. Default: uid={0}
The duration period in milliseconds for the credential cache. Default: 0
The duration period in milliseconds for the search cache. Default: 0
  • max_active

    The maximum number of active connections to the LDAP server. Default: 8

  • max_idle

    The maximum number of idle connections in the pool awaiting requests. Default: 8

Scheduler settings for Solr indexes 

These settings control the schedulers in charge of querying for and removing expired data.
    fixed_rate_period: 300
    initial_delay: 20
    max_docs_per_batch: 4096
    thread_pool_size: 1
The ttl_index_rebuild_options settings control the schedulers in charge of querying for and removing expired data.
Schedules how often to check for expired data in seconds. Default: 300
Speeds start-up time by delaying the first TTL checks in seconds. Default: 20
Sets the maximum number of documents to check and delete per batch by the TTL rebuild thread. Default: 4096
To manage system resource consumption and prevent many Solr cores from executing simultaneous TTL deletes, define the maximum number of cores that can execute TTL cleanup concurrently. Default: 1

Solr resource upload limit 

You can configure the maximum resource file size or disable resource upload.
solr_resource_upload_limit_mb: 10
Sets the maximum Solr resource upload size limit in MB. Set to 0 to disable resource uploading. Default: 10

Solr shard transport options 

The shard_transport_options use netty or http for inter-node communication between DSE Search nodes. Also see Shard transport options for DSE Search communications.

    type: netty
    netty_server_port: 8984
    message_server_port: 8985
# Options specific to the "http" transport type.
#   http_shard_client_conn_timeout: 0
#   http_shard_client_socket_timeout: 0
  • netty TCP-based communication provides lower latency, improved throughput, and reduced resource consumption.
  • http uses a standard HTTP-based interface.

When type: netty, define the following netty settings to configure inter-node communication between DSE Search nodes:

netty_server_port (deprecated) 
The TCP listen port. This setting is mandatory to use the netty transport now or migrate to it later. To use http transport, comment out this setting or change it to -1. During updrade to 5.0, netty_server_port is used. After all nodes are running 5.0, message_server_port is used. Default: 8984
The TCP listen port for the inter-node message server. Requests that are coordinated by this node use this port to communicate with other nodes. Default: 8985
The number of server acceptor threads. Default: number_of_available_processors
The number of server worker threads. Default: number_of_available_processors * 8
The number of client worker threads. Default: number_of_available_processors * 8
The maximum number of client connections. Default: 100
The client request timeout, in milliseconds, for the maximum cumulative time that a distributed Solr request will wait idly for shard responses. Default: 60000
The maximum length of a message frame, in MB. Default: 256

When type: http, define the following http transport settings to configure http inter-node communication between DSE Search nodes. To avoid blocking operations, DataStax strongly recommends changing these settings to a finite value. These settings are valid across Solr cores:

HTTP shard client timeouts in milliseconds. 0 = no timeout. Default: 0
HTTP shard client socket timeouts in milliseconds. 0 = no timeout. Default: 0

Solr indexing 

DSE Search provides multi-threaded indexing implementation to improve performance on multi-core machines. All index updates are internally dispatched to a per-core indexing thread pool and executed asynchronously, which allows for greater concurrency and parallelism. However, index requests can return a response before the indexing operation is executed.
max_solr_concurrency_per_core: 2
# enable_back_pressure_adaptive_nrt_commit: true
# back_pressure_threshold_per_core: 2000
# flush_max_time_per_core: 5
# load_max_time_per_core: 5
Configures the maximum number of concurrent asynchronous indexing threads per Solr core. If set to 1, DSE Search uses synchronous indexing behavior in a single thread. To achieve optimal performance, assign this value to number of available CPU cores divided by the number of Solr cores. For example, with 12 CPU cores and 3 Solr cores, the suggested value is 4. See Configuring and tuning indexing. Default: number_of_available_CPU_cores. To prevent writes from overwhelming reads, reduce this value and parallelDeleteTasks in solrConfig.xml.
Note: Dynamic switching to Solr concurrency level at 1 is disallowed.
Allows back pressure system to adapt max auto soft commit time (defined per core in the solrconfig.xml file) to the actual load. Setting is respected only for NRT (near real time) cores. When Solr core is using live indexing with RT (real time) enabled, adaptive commits are disabled regardless of this property value. Default: true
The total number of queued asynchronous indexing requests per Solr core. When this number is exceeded, back pressure prevents excessive resource consumption by throttling new incoming requests. DataStax recommends 1000 * max_solr_concurrency_per_core. Default: 2000

The maximum time, in minutes, to wait for the flushing of asynchronous index updates, which occurs at Solr commit time or at Cassandra flush time. Expert level knowledge is required to change this value. Always set the value reasonably high to ensure that flushing completes successfully. If the configured value is exceeded, index updates are only partially committed, and the Cassandra commit log is not truncated to ensure data durability.

Note: When a timeout occurs, it usually means this node is being overloaded and cannot flush in a timely manner. Live indexing increases the time to flush asynchronous index updates.
Default: 5
The maximum time, in minutes, to wait for each Solr core to load on startup or create/reload operations, expressed. This advanced option should be changed only if exceptions happen during core loading. Default: 1 (if not specified)

Cassandra disk failure policy 

# enable_index_disk_failure_policy: false
DSE Search activates the configured Cassandra disk failure policy if IOExceptions occur during index update operations. Default: false

Solr CQL query options 

Available options for CQL Solr queries.
solr_data_dir: /MyDir
cql_solr_query_executor_threads: 2
cql_solr_query_row_timeout: 10000
The directory to store index data. By default, the Solr data is saved in cassandra_data_dir/, or as specified by the system property.
The maximum number of threads for retrieving rows during CQL Solr queries. This value is cross-request and cross-core. Default: number of available processors * 10
The maximum time in milliseconds to wait for each row to be read from Cassandra during CQL Solr queries. Default: 10000 milliseconds (10 seconds)

CQL Performance Service options 

These settings are used by the Performance Service to configure collection of performance metrics on Cassandra nodes. Performance metrics are stored in the dse_perf keyspace and can be queried with CQL using any CQL-based utility, such as cqlsh, DataStax DevCenter, or any application using a Cassandra CQL driver.
    enabled: true
    threshold_ms: 2000
    ttl_seconds: 259200
    enabled: false
    refresh_rate_ms: 10000
    enabled: false
    refresh_rate_ms: 10000
    enabled: false
    refresh_rate_ms: 10000
    enabled: false
    refresh_rate_ms: 10000
  enabled: false
  refresh_rate_ms: 10000
  retention_count: 3
   enabled: false
   refresh_rate_ms: 10000
   top_stats_limit: 100
   quantiles: false
Report distributed sub-queries for Solr (query executions on individual shards) that take longer than a specified period of time. See Collecting slow queries.
Enables (true) or disables (false) log entries for slow queries. Default: true
Defines the threshold time in milliseconds. Default: 2000
Defines the time, in milliseconds, to keep the slow query log entries. Default: 259200
CQL system information tables settings See Collecting system level diagnostics.
Default: false
Default: 10000
Data resource latency tracking settings. See Collecting system level diagnostics.
Default: false
Default: 10000
Database summary statistics settings. See Collecting database summary diagnostics.
Default: false
Default: 10000
Cluster summary statistics settings. See Collecting cluster summary diagnostics.
Default: false
Default: 10000
Column Family Histogram data tables settings. See Collecting table histogram diagnostics.
Default: false
Default: 10000
Default: 3
User-resource latency tracking settings. See Collecting user activity diagnostics.
Default: false
Default: 10000
Default: 100
Default: false

Spark Performance Service options 

These settings are used by the Performance Service. See Monitoring Spark with Spark Performance Objects.
    enabled: false
    refresh_rate_ms: 10000
    enabled: false
    refresh_rate_ms: 10000
        sink: false
        connectorSource: false
        jvmSource: false
        stateSource: false
        sink: false
        connectorSource: false
        jvmSource: false
  • enabled

    Default: false

  • refresh_rate_ms

    Default: 10000 milliseconds

Statistics options.
Default: false
Default: 10000 milliseconds
The driver option controls the metrics collected by the Spark Driver.
Enables or disables writing of the metrics that are collected at Spark Driver to the Cassandra database. Default: false
Enables or disables Spark Cassandra Connector metrics at Spark Driver. Default: false
Enables or disables JVM heap and GC metrics at Spark Driver. Default: false
Enables or disables application state metrics. Default: false

Solr Performance Service options 

These settings are used by the Performance Service. See Collecting Solr performance statistics.
    enabled: false
    ttl_seconds: 604800
    async_writers: 1
    enabled: false
    ttl_seconds: 604800
    async_writers: 1
    threshold_ms: 3000
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
Enable to collect record errors that occur during Solr document indexing. See Collecting indexing errors.
Default: false
Default: 604800
Defines the number of server threads dedicated to writing in the log. More than one server thread might degrade performance. Default: 1
See Collecting slow Solr queries.
Default: false
Default: 604800
Defines the number of server threads dedicated to writing in the log. More than one server thread might degrade performance. Default: 1
For the slow log, the level (in milliseconds) at which a sub-query is slow enough to be reported. Default is three seconds. Default: 3000
See Collecting handler statistics.
Default: false
Default: 604800
Default: 60000
See Collecting index statistics.
Default: false
Default: 604800
Default: 60000
See Collecting cache statistics.
Default: false
Default: 604800
Default: 60000
See Collecting Solr performance statistics.
Default: false
Default: 604800
Default: 60000

Node health options 

    refresh_rate_ms: 60000
    uptime_ramp_up_period_seconds: 86400
    dropped_mutation_window_minutes: 30
Node health options are always enabled.
Default: 60000
The amount of continuous uptime required for the node's uptime score to advance the node health score from 0 to 1 (full health), assuming there are no recent dropped mutations. The health score is a composite score based on dropped mutations and uptime. Tip: If a node is repairing after a period of downtime, you might want to increase the uptime period to the expected repair time. Default: 86400 (1 day)
The historic time window over which the rate of dropped mutations affect the node health score. Default: 30

Health-based routing 

enable_health_based_routing: true
Enable replication selection for distributed Solr queries to consider node health when multiple candidates exist for a particular token range. Health-based routing enables a trade-off between index consistency and query throughput. When the primary concern is performance, do not enable health-based routing. Default: true

Reindexing of bootstrapped data 

async_bootstrap_reindex: false
For DSE Search, configure whether to asynchronously re-index bootstrapped data. Default: false
  • If enabled, the node joins the ring immediately after bootstrap and re-indexing occurs asynchronously. Do not wait for post-bootstrap reindexing so that the node is not marked down.
  • If disabled, the node joins the ring after re-indexing the bootstrapped data.

Encryption and system key settings 

Settings for encrypting passwords and sensitive system tables.
system_key_directory: /etc/dse/conf
config_encryption_active: false
config_encryption_key_name: system_key
The directory where global encryption keys, called system keys, are stored. Keys that are used for SSTable encryption must be distributed to all nodes. DataStax Enterprise must be able to read and write to this directory. This directory must have 700 permissions and belong to the dse user. Default: /etc/dse/conf
See Configuring encryption using off-server encryption keys and Configuring encryption using local encryption keys.
When set to true (default: false), the following configuration values must be encrypted:


The name of the system key for encrypting and decrypting stored passwords in the configuration files. To encrypt keyfiles, use dsetool createsystemkey. When config_encryption_active is true, you must provide a valid key with this name for the system_key_directory option. Default: system_key

System encryption settings 

  enabled: false
  cipher_algorithm: AES
  secret_key_strength: 128
  chunk_length_kb: 64
  key_name: system_table_keytab
DataStax recommends using remote encryption keys from a KMIP server when using Transparent Data Encryption (TDE) features. Local key support is provided when a KMIP server is not available.
Enable to locally encrypt system tables that might contain sensitive information, including system.batchlog, system.paxos, hint files, and the Cassandra commit log. If true, system tables that contain sensitive information are encrypted. When you enable system table encryption on a node with existing data, run nodetool upgradesstables -a on the listed tables. When tracing is enabled, sensitive information is written to the tables in the system_traces keyspace. Configure those tables to encrypt their data by using an encrypting compressor. Default: false
Default: AES
Default: 128
Default: 64
The name of the keys file that is created to encrypt system tables. This file is created in system_key_directory/system/key_name. Comment out when using key_provider: KmipKeyProviderFactory Default: system_table_keytab
An alternate key provider for local encryption. Useful for using a KMIP host as a key provider. Default: KmipKeyProviderFactory
When key_provider: KmipKeyProviderFactory, the kmip_groupname that is defined for the kmip_hosts entry in dse.yaml that describes the KMIP key server or group of KMIP key servers.

DSE Analytics Hive options 

    insert_max_retries: 6
    insert_retry_sleep_period: 50
Retries setting when Hive inserts data to Cassandra table.
Maximum number of retries. Default: 6
Period of time in milliseconds between retries. Default: 50

Spark memory 

initial_spark_worker_resources: 0.7
DataStax Enterprise can control the memory and cores offered by particular Spark Workers in semi-automatic fashion.
Use the initial_spark_worker_resources parameter to specify the fraction of system resources that are made available to the Spark Worker. The available resources are calculated in the following way:
  • Spark Worker memory = initial_spark_worker_resources * (total system memory - memory assigned to Cassandra)
  • Spark Worker cores = initial_spark_worker_resources * total system cores
The lowest values that you can assign to Spark Worker memory is 64 MB. The lowest value that you can assign to Spark Worker cores is 1 core. If the results are lower, no exception is thrown and the values are automatically limited. The range of the initial_spark_worker_resources value is 0.01 to 1. If the range is not specified, the default value 0.7 is used.

This mechanism is used by default to set the Spark Worker memory and cores. To override the default, uncomment and edit one or both SPARK_WORKER_MEMORY and SPARK_WORKER_CORES options in the file.

Audit logging options 

  enabled: false
  logger: SLF4JAuditWriter
  retention_time: 0
To get the maximum information from data auditing, turn on data auditing on every node. See Configuring and using data auditing and Configuring audit logging to a logback log file.
Default: false
Default: SLF4JAuditWriterfalse
  • SLF4JAuditWriter - Logs audit information to the SLF4JAuditWriter logger. Audit logging configuration settings are in the logback.xml file.
    The location of the logback.xml file depends on the type of installation:
    Installer-Services and Package installations /etc/dse/cassandra/logback.xml
    Installer-No Services and Tarball installations install_location/resources/cassandra/logback.xml
  • CassandraAuditWriter - Logs audit information to a Cassandra table. This logger can be run synchronously or asynchronously. Audit logs are stored in the dse_audit.audit_log table. See related cassandra_audit_writer_options configuration entries and Configuring audit logging to a Cassandra table.
included_categories or excluded_categories 
Specify either included or excluded categories. Specifying both is an error.

Comma separated list of audit event categories to include or exclude from the audit log. Categories are: QUERY, DML, DDL, DCL, AUTH.

included_categories: comma_separated_list
excluded_categories: comma_separated_list
included_keyspaces or excluded_keyspaces
Specify either included or excluded keyspaces. Specifying both is an error.

Use a regular expression to filter keyspaces, or use a comma separated list of keyspaces to be included or excluded from the audit log.

included_keyspaces: comma_separated_list
excluded_keyspaces: comma_separated_list
The amount of time, in hours, that audit events are retained by supporting loggers. Only the CassandraAuditWriter supports retention time. Values of 0 or less retain events forever. Default: 0
Configuration options for the CassandraAuditWriter.
    mode: sync
    batch_size: 50
    flush_time: 500
    num_writers: 10
    queue_size: 10000
    write_consistency: QUORUM
Sets the mode the writer runs in. Default: sync
  • sync - A query is not executed until the audit event is successfully written.
  • async - Audit events are queued for writing to the audit table, but are not necessarily logged before the query executes. A pool of writer threads consumes the audit events from the queue, and writes them to the audit table in batch queries. While this substantially improves performance under load, if there is a failure between when a query is executed, and its audit event is written to the table, the audit table might be missing entries for queries that were executed.
Available only when mode:async.

Must be greater than 0. The maximum number of events the writer will dequeue before writing them out to the table. If warnings in your logs reveal that batches are too large, decrease this value or increase the value of batch_size_warn_threshold_in_kb in cassandra.yaml. Default: 50

Available only when mode:async.

The maximum amount of time in milliseconds before an event is removed from the queue by a writer before being written out. This flush time prevents events from waiting too long before being written to the table when there are not a lot of queries happening. Default: 500

Available only when mode:async.

The number of worker threads asynchronously logging events to the CassandraAuditWriter. Default: 10

queue_size: 10000
The size of the queue feeding the asynchronous audit log writer threads. When there are more events being produced than the writers can write out, the queue fills up, and newer queries block until there is space on the queue. If a value of 0 is used, the queue size is unbounded, which can lead to resource exhaustion under heavy query load. Default: 10000
write_consistency: QUORUM
The consistency level that is used to write audit events. Default: QUORUM

KMIP encryption options 

Options for KMIP encryption keys and communication between the DataStax Enterprise node and the KMIP key server or key servers.

    keystore_path: pathto/kmip/keystore.jks
    keystore_type: jks
    keystore_password: password
    truststore_path: pathto/kmip/truststore.jks
    truststore_type: jks
    truststore_password: password
    key_cache_millis: 300000
    timeout: 1000
Configure options for a kmip_groupname section for each KMIP key server or group of KMIP key servers. Using separate key server configuration settings allows use of different key servers to encrypt table data, and eliminates the need to enter key server configuration information in DDL statements and other configurations.

A user-defined name for a group of options to configure a KMIP server or servers, key settings, and certificates.

  • hosts

    A comma-separated list of hosts[:port] for the KMIP key server. There is no load balancing. In failover scenarios, failover occurs in the same order that servers are listed. For example: hosts:,

  • keystore_path

    The path to a java keystore that identifies the DSE node to the KMIP key server. For example: /path/to/keystore.jks

  • keystore_type

    The type of key store. The default value is jks.

  • keystore_password

    The password to access the key store.

  • truststore_path

    The path to a java truststore that identifies the KMIP key server to the DataStax Enterprise node. For example: /path/to/truststore.jks

  • truststore_type

    The type of trust store.

  • truststore_password

    The password to access the trust store.

  • key_cache_milli

    Milliseconds to locally cache the encryption keys that are read from the KMIP hosts. The longer the encryption keys are cached, the fewer requests are made to the KMIP key server, but the longer it takes for changes, like revocation, to propagate to the DSE node. Default: 300000.

  • timeout

    Socket timeout in milliseconds. Default: 1000.

CQL Solr paging 

Option to specify the search pagination (cursors) behavior.
cql_solr_query_paging: off
Default: off.
  • off - Paging is off. Do not respect driver paging settings for CQL Solr queries. To dynamically enable paging in the query, use the paging:driver parameter in JSON queries.
  • driver - Respects driver paging settings. Specifies to use Solr pagination (cursors) only when the driver uses pagination. Required for DSE SearchAnalytics workloads when analytics nodes leverage search.
The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations install_location/resources/cassandra/conf/cassandra.yaml