DataStax Enterprise configuration file (dse.yaml)

dse.yaml is the primary DataStax Enterprise configuration file.

The dse.yaml file is the primary configuration file for DataStax Enterprise.

For cassandra.yaml configuration, see Node and cluster configuration (cassandra.yaml).

DSE In-Memory option  

max_memory_to_lock_fraction
max_memory_to_lock_mb
To use the DSE In-Memory, choose one of these options to specify how much system memory to use for all in-memory tables.
  • max_memory_to_lock_fraction

    Specify a fraction of the system memory. The default value of 0.20 specifies to use up to 20% of system memory.

  • max_memory_to_lock_mb

    Specify a maximum amount of memory in MB.

Hive meta store 

hive_meta_store_enabled 
Enables or disables the Hive meta store via Cassandra. Default: true.

Kerberos support 

Use these options for configuring security for a DataStax Enterprise cluster using Kerberos. For instructions, see Authenticating a cluster with Kerberos.

kerberos_options:
   keytab: path_to_keytab/dse.keytab
   service_principal: dse_user/_HOST@REALM
   http_principal: HTTP/_HOST@REALM
   qop: auth
  • keytab: resources/dse/conf/dse.keytab

    The keytab file must contain the credentials for both of the fully resolved principal names, which replace _HOST with the FQDN of the host in the service_principal and http_principal settings. The UNIX user running DSE must also have read permissions on the keytab.

  • service_principal: dse_user/_HOST@REALM

    The service_principal that the Cassandra and Hadoop processes run under must use the form dse_user/_HOST@REALM, where dse_user is:

    • Installer-Services and Package installations: cassandra
    • Package installations: the name of the UNIX user that starts the service
    where:
    • _HOST is converted to a reverse DNS lookup of the broadcast address.
    • REALM is the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase.
    The service_principal must be consistent everywhere: in the dse.yaml, present in the keytab, and in the cqlshrc file (where service_principal is separated into service/hostname).
  • http_principal: HTTP/_HOST@REALM

    The http_principal is used by the tomcat application container to run DSE Search/Solr. The web server uses GSS-API mechanism (SPNEGO) to negotiate the GSSAPI security mechanism (Kerberos). Set REALM to the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase.

  • qop - auth

    A comma-delimited list of Quality of Protection values that clients and servers can use for each connection. The client can have multiple QOP values, while the server can have only a single QOP value. The valid values are:
    • auth - Default: Authentication only.
    • auth-int - Authentication plus integrity protection for all transmitted data.
    • auth-conf - Authentication plus integrity protection and encryption of all transmitted data.

      Encryption using auth-conf is separate and completely independent of whether encryption is done using SSL. If both auth-conf and SSL are enabled, the transmitted data is encrypted twice. DataStax recommends choosing only one method and using it for both encryption and authentication.

LDAP options 

To use these options, you must set com.datastax.bdp.cassandra.auth.LdapAuthenticator as the authenticator in the cassandra.yaml file. For instructions, see Authenticating a cluster with LDAP.

server_host 
The host name of the LDAP server. Default: localhost
server_port 
The port on which the LDAP server listens. Default: 389
search_dn 
The username of the user that is used to search for other users on the LDAP server.
search_password 
The password of the search_dn user.
use_ssl 
Set to true to enable SSL connections to the LDAP server. If set to true, you may need to change server_port to the SSL port of the LDAP server. Default: false
use_tls 
Set to true to enable TLS connections to the LDAP server. If set to true, you may need to change the server_port to the TLS port of the LDAP server. Default: false
truststore_path 
The path to the trust store for SSL certificates.
truststore_password 
The password to access the trust store.
truststore_type 
The type of trust store. Default: jks
user_search_base 
The search base for your domain, used to look up users. Set the ou and dc elements for your LDAP domain. Typically this is set to ou=users,dc=domain,dc=top level domain. For example, ou=users,dc=example,dc=com.
user_search_filter 
The search filter for looking up usernames. Default: uid={0}
credentials_validity_in_ms 
The duration period in milliseconds for the credential cache. Default: 0
search_validity_in_seconds 
The duration period in milliseconds for the search cache. Default: 0
connection_pool 
  • max_active

    The maximum number of active connections to the LDAP server. Default: 8

  • max_idle

    The maximum number of idle connections in the pool awaiting requests. Default: 8

Scheduler settings for Solr indexes 

These settings control the schedulers in charge of querying for and removing expired data.

ttl_index_rebuild_options 
  • fix_rate_period

    Schedules how often to check for expired data in seconds. Default: 300

  • initial_delay

    Speeds up start-up by delaying the first TTL checks in seconds. Default: 20

  • max_docs_per_batch

    Sets the maximum number of documents to delete per batch by the TTL rebuild thread. Default: 200

Solr resource upload limit 

solr_resource_upload_limit_mb 
Sets the maximum Solr resource upload size limit in MB. Set to 0 to disable resource uploading. Default: 10

Solr shard transport options 

For inter-node communication between DSE Search nodes. Also see Shard transport options for DSE Search communications.

shard_transport_options 
These options are specific to netty.
  • type

    netty is used for TCP-based communication. It provides lower latency, improved throughput, and reduced resource consumption than http transport, which uses standard a HTTP-based interface for communication. Default: netty

  • netty_server_port

    The TCP listen port. This setting is mandatory to use the netty transport now or migrate to it later. To use http transport, comment out this setting or change it to -1. Default: 8984

  • netty_server_acceptor_threads

    The number of server acceptor threads. Default: number of available processors

  • netty_server_worker_threads

    The number of server worker threads. Default: number of available processors * 8

  • netty_client_worker_thread

    The number of client worker threads. Default: number of available processors * 8

  • netty_client_max_connections

    The maximum number of client connections. Default: 100

  • netty_client_request_timeout

    The client request timeout in milliseconds is the maximum cumulative time that a distributed Solr request will wait idly for shard responses. Default: 60000

HTTP transport settings 
The defaults for are the same as Solr, that is 0, meaning no timeout at all. To avoid blocking operations, DataStax strongly recommends to changing these settings to a finite value. These settings are valid across Solr cores:
  • http_shard_client_conn_timeout

    HTTP shard client timeouts in milliseconds. Default: 0

  • http_shard_client_socket_timeout

    HTTP shard client socket timeouts in milliseconds. Default: 0

Solr indexing 

DSE Search provides multi-threaded indexing implementation to improve performance on multi-core machines. All index updates are internally dispatched to a per-core indexing thread pool and executed asynchronously, which allows for greater concurrency and parallelism. However, index requests can return a response before the indexing operation is executed.

max_solr_concurrency_per_core 
Configures the maximum number of concurrent asynchronous indexing threads per Solr core. If set to 1, DSE Search uses synchronous indexing behavior in a single thread. To achieve optimal performance when using live indexing, ensure that this value is the number of CPU cores. Also see Configuring multi-threaded indexing threads. Default: number of available CPU cores
Note: Dynamic switching to Solr concurrency level at 1 is disallowed.
back_pressure_threshold_per_core 
The total number of queued asynchronous indexing requests per Solr core, computed at Solr commit time. When exceeded, back pressure prevents excessive resources consumption by throttling new incoming requests. Default: 500
flush_max_time_per_core 
The maximum time, in minutes, to wait for the flushing of asynchronous index updates, which occurs at Solr commit time or at Cassandra flush time. Expert level knowledge is required to change this value. Always set the value reasonably high to ensure that flushing completes successfully. If the configured value is exceeded, index updates are only partially committed, and the Cassandra commit log is not truncated to ensure data durability.
Note: When a timeout occurs, it usually means this node is being overloaded and cannot flush in a timely manner. Live indexing increases the time to flush asynchronous index updates.
Default: 5
load_max_time_per_core 
The maximum time in minutes wait for each Solr core to load on startup or create/reload operations, expressed. This advanced option should be changed only if exceptions happen during core loading. Default: 1 (if not specified)

Cassandra disk failure policy 

enable_index_disk_failure_policy 
DSE Search activates the configured Cassandra disk failure policy if IOExceptions occur during index update operations. Default: false

Solr CQL query options 

Available options for CQL Solr queries.

solr_data_dir 
The directory to store index data. By default, the Solr data is saved in cassandra_data_dir/solr.data, or as specified by the dse.solr.data.dir system property.
cql_solr_query_executor_threads 
The maximum number of threads for retrieving rows during CQL Solr queries. This value is cross-request and cross-core. Default: number of available processors * 10
cql_solr_query_row_timeout 
The maximum time in milliseconds to wait for each row to be read from Cassandra during CQL Solr queries. Default: 10000 milliseconds (10 seconds)

CQL Performance Service options 

These settings are used by the Performance Service to configure how it collects performance metrics on Cassandra nodes. They are stored in the dse_perf keyspace and can be queried with CQL using any CQL-based utility, such as cqlsh, DataStax DevCenter, or any application using a Cassandra CQL driver.

cql_slow_log_options 
Report distributed sub-queries for Solr (query executions on individual shards) that take longer than a specified period of time.
  • enabled

    Enables (true) or disables (false) log entries for slow queries. Default: false

  • cql_slow_log_threshold_ms

    Defines the threshold time. Default: 100 milliseconds

  • cql_slow_log_ttl

    Defines the time to keep the slow query log entries. Default: 86400 milliseconds

  • async_writers

    Defines the number of server threads to dedicate to writing in the log. More than one server thread might degrade performance. Default: 1

See Collecting slow queries.

cql_system_info_options 
CQL system information tables settings
  • enabled

    Default: false

  • refresh_rate_ms

    Default: 10000 milliseconds

See Collecting system level diagnostics.

resource_level_latency_tracking_options 
Data resource latency tracking settings:
  • enabled

    Default: false

  • refresh_rate_ms

    Default: 10000 milliseconds

See Collecting system level diagnostics.

db_summary_stats_options 
Database summary statistics settings
  • enabled

    Default: false

  • refresh_rate_ms

    Default: 10000 milliseconds

See Collecting database summary diagnostics.

cluster_summary_stats_options 
Cluster summary statistics settings
  • enabled

    Default: false

  • refresh_rate_ms

    Default: 10000 milliseconds

See Collecting cluster summary diagnostics.

histogram_data_options 
Column Family Histogram data tables settings
  • enabled

    Default: false

  • refresh_rate_ms

    Default: 10000 milliseconds

  • retention_count

    Default: 3

See Collecting table histogram diagnostics.

user_level_latency_tracking_options 
User-resource latency tracking settings
  • enabled

    Default: false

  • refresh_rate_ms

    Default: 10000 milliseconds

  • top_stats_limit

    Default: 100

See Collecting user activity diagnostics.

Spark Performance Service options 

These settings are used by the Performance Service. See Monitoring Spark with Spark Performance Objects.

spark_cluster_info_options
  • enabled

    Default: false

  • refresh_rate_ms

    Default: 10000 milliseconds

spark_application_info_options
  • enabled

    Default: false

  • refresh_rate_ms

    Default: 10000 milliseconds

  • driver

    The driver option controls the metrics collected by the Spark Driver.

Solr Performance Service options 

These settings are used by the Performance Service. See Collecting Solr performance statistics.

solr_indexing_error_log_options 
  • enabled

    Default: false

  • ttl_seconds

    Default: 604800 seconds

  • async_writers

    Default: 1

See Collecting indexing errors.

solr_slow_sub_query_log_options 
  • enabled

    Default: false

  • ttl_seconds

    Default: 604800 seconds

  • async_writers

    Default: 1

  • threshold_ms

    The level (in milliseconds) at which a sub-query is slow enough to be reported. Three second threshold. Default: 3000

See Collecting slow Solr queries.

solr_update_handler_metrics_options 
  • enabled

    Default: false

  • ttl_seconds

    Default: 604800 seconds

  • refresh_rate_ms

    Default: 60000 milliseconds

See Collecting handler statistics.

solr_index_stats_options 
  • enabled

    Default: false

  • ttl_seconds

    Default: 604800

  • refresh_rate_ms

    Default: 60000

See Collecting index statistics.

solr_cache_stats_options 
  • enabled

    Default: false

  • ttl_seconds

    Default: 604800

  • refresh_rate_ms

    Default: 60000

See Collecting cache statistics.

solr_latency_snapshot_options 
  • enabled

    Default: false

  • ttl_seconds

    Default: 604800 seconds

  • refresh_rate_ms

    Default: 60000 milliseconds

See Collecting Solr performance statistics.

node_health_options 
node_health_options:
    enabled: false
    refresh_rate_ms: 60000
    uptime_ramp_up_period_seconds: 86400
    dropped_mutation_window_minutes: 30
  • enabled

    Enable node health data collection. Default: false

  • ttl_seconds

    Default: 60000 seconds

  • uptime_ramp_up_period_seconds

    The amount of continuous uptime required for the node's uptime score to advance the node health score from 0 to 1 (full health), assuming there are no recent dropped mutations. The health score is a composite score based on dropped mutations and uptime. Tip: If a node is repairing after a period of downtime, you might want to increase the uptime period to the expected repair time. Default: 86400 (1 day)

  • dropped_mutation_window_minutes

    The historic time window over which the rate of dropped mutations affect the node health score. Default: 30

Health-based routing 
enable_health_based_routing: true
  • enable_health_based_routing

    Enable replication selection for distributed Solr queries to consider node health when multiple candidates exist for a particular token range. Health-based routing enables a trade-off between index consistency and query throughput. When the primary concern is performance, do not enable health-based routing. Default: true

Reindexing of bootstrapped data 

async_bootstrap_reindex: false
async_bootstrap_reindex
For DSE Search, configure whether to asynchronously re-index bootstrapped data. Default: false
  • If enabled, the node joins the ring immediately after bootstrap and re-indexing occurs asynchronously. Do not wait for post-bootstrap reindexing so that the node is not marked down.
  • If disabled, the node joins the ring after re-indexing the bootstrapped data.

Encryption settings 

Settings for encrypting passwords and sensitive system tables.

system_key_directory 
The directory where global encryption keys, called system keys, are kept. Keys used for SSTable encryption must be distributed to all nodes, DataStax Enterprise must be able to read and write to this directory, and have 700 permissions and belong to the dse user. Default: /etc/dse/conf

For details, see Configuring encryption using off-server encryption keys and Configuring encryption using local encryption keys.

config_encryption_active 
When set to true (default: false), the following configuration values must be encrypted:
dse.yaml
  • ldap_options.search_password
  • ldap_options.truststore_password

cassandra.yaml

  • server_encryption_options.keystore_password
  • server_encryption_options.truststore_password
  • client_encryption_options.keystore_password
  • client_encryption_options.truststore_password
  • ldap_options.truststore_password
config_encryption_key_name 
The name of the system key for encrypting and decrypting stored passwords in the configuration files. To encrypt keyfiles, use dsetool createsystemkey. When config_encryption_active is true, you must provide a valid key with this name for the system_key_directory option. Default: system_key
system_info_encryption 
If enabled, system tables that contain sensitive information, such as system.hints, system.batchlog, and system.paxos, are encrypted. If enabling system table encryption on a node with existing data, run nodetool upgradesstables -a on the listed tables. When tracing is enabled, sensitive information is written into the tables in the system_traces keyspace. Configure those tables to encrypt their data by using an encrypting compressor.
  • enabled

    Default: false

  • cipher_algorithm

    Default: AES

  • secret_key_strength

    Default: 128

  • chunk_length_kb

    Default: 64

  • key_name

    The name of the keys file that is created to encrypt system tables. This file is created in system_key_directory/system/key_name. Comment out when using key_provider: KmipKeyProviderFactory Default: system_table_keytab

  • key_provider

    Use KMIP off-server encryption. An alternate key provider only for local encryption when using a KMIP host as a key provider. Omit this field if you are not using KmipKeyProviderFactory. Default: KmipKeyProviderFactory

  • kmip_host

    The kmip_groupname that is defined in dse.yaml that describes the KMIP key server or group of KMIP key servers.

Hive options

hive_options 
Retries setting when Hive inserts data to Cassandra table.
  • insert_max_retries

    Maximum number of retries. Default: 6

  • insert_retry_sleep_period

    Period of time in milliseconds between retries. Default: 50

Audit logging settings 

To get the maximum information from data auditing, turn on data auditing on every node. See Configuring and using data auditing and Configuring audit logging to a logback log file.

audit_logging_options 
  • enabled

    Default: false

  • Available loggers:
    • CassandraAuditWriter

      Logs audit information to a Cassandra table. This logger can be run either synchronously or asynchronously. Audit logs are stored in the dse_audit.audit_log table. When run synchronously, a query will not execute until it has been written to the audit log table successfully. If there is a failure between when an audit event is written and it's query is executed, the audit logs may contain queries that were never executed. Also see Configuring audit logging to a Cassandra table.

    • SLF4JAuditWriter

      Logs audit information to the SLF4JAuditWriter logger. Audit logging configuration settings are in the logback.xml file.

      The location of the logback.xml file depends on the type of installation:
      Installer-Services and Package installations /etc/dse/cassandra/conf/logback.xml
      Installer-No Services and Tarball installations install_location/resources/cassandra/conf/logback.xml
  • include_categories or exclude_categories

    Comma separated list of audit event categories to be included or excluded from the audit log. Categories are: QUERY, DML, DDL, DCL, AUTH, ADMIN. Specify either included or excluded categories. Specifying both is an error.

  • included_keyspaces or excluded_keyspaces

    Comma separated list of keyspaces to be included or excluded from the audit log. Specify either included or excluded keyspaces. Specifying both is an error.

  • retention_time

    The amount of time, in hours, that audit events are retained by supporting loggers. Currently, only the CassandraAuditWriter supports retention time. Values of 0 or less retain events forever. Default: 0

  • cassandra_audit_writer_options

    Sets the mode the writer runs in. When run synchronously, a query is not executed until the audit event is successfully written. When run asynchronously, audit events are queued for writing to the audit table, but are not necessarily logged before the query executes. A pool of writer threads consumes the audit events from the queue, and writes them to the audit table in batch queries. While this substantially improves performance under load, if there is a failure between when a query is executed, and its audit event is written to the table, the audit table may be missing entries for queries that were executed.

    • mode

      Default: sync

    • batch_size (async mode only)

      Must be greater than 0. The maximum number of events the writer will dequeue before writing them out to the table. If you're seeing warnings in your logs about batches being too large, decrease this value. Increasing batch_size_warn_threshold_in_kb in cassandra.yaml is also an option. Make sure you understand the implications before doing so. Default: 50

    • flush_time (async mode only)

      The maximum amount of time in milliseconds before an event is removed from the queue by a writer before being written out. This prevents events from waiting too long before being written to the table when there are not a lot of queries happening. Default: 500

    • num_writers (async mode only)

      The number of worker threads asynchronously logging events to the CassandraAuditWriter. Default: 10

    • queue_size

      The size of the queue feeding the asynchronous audit log writer threads. When there are more events being produced than the writers can write out, the queue will fill up, and newer queries will block until there is space on the queue. If a value of 0 is used, the queue size will be unbounded, which can lead to resource exhaustion under heavy query load. Default: 10000

    • write_consistency

      The consistency level used to write audit events. Default: QUORUM

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations install_location/resources/cassandra/conf/cassandra.yaml

KMIP encryption options 

Options for KMIP encryption keys and communication between the DataStax Enterprise node and the KMIP key server or key servers.
kmip_hosts
Configure options for a kmip_groupname section for each KMIP key server or group of KMIP key servers. Using separate key server configuration settings allows use of different key servers to encrypt table data, and eliminates the need to enter key server configuration information in DDL statements and other configurations.
kmip_groupname

A user-defined name for a group of options to configure a KMIP server or servers, key settings, and certificates.

  • hosts

    A comma-separated list of hosts[:port] for the KMIP key server. There is no load balancing. In failover scenarios, failover occurs in the same order that servers are listed. For example: hosts: kmip1.yourdomain.com, kmip2.yourdomain.com

  • keystore_path

    The path to a java keystore that identifies the DSE node to the KMIP key server. For example: /path/to/keystore.jks

  • keystore_type

    The type of key store. The default value is jks.

  • keystore_password

    The password to access the key store.

  • truststore_path

    The path to a java truststore that identifies the KMIP key server to the DSE node. For example: /path/to/truststore.jks

  • truststore_type

    The type of trust store.

  • truststore_password

    The password to access the trust store.

  • key_cache_milli

    Milliseconds to locally cache the encryption keys that are read from the KMIP hosts. The longer the encryption keys are cached, the fewer requests are made to the KMIP key server, but the longer it takes for changes, like revocation, to propagate to the DSE node. Default: 300000.

  • timeout

    Socket timeout in milliseconds. Default: 1000.