DataStax Enterprise configuration file (dse.yaml)
The configuration file for DataStax Enterprise.
Installer-Services | /etc/dse/dse.yaml |
Package installations | /etc/dse/dse.yaml |
Installer-No Services | install_location/resources/dse/conf/dse.yaml |
Tarball installations | install_location/resources/dse/conf/dse.yaml |
Kerberos support
Use these options for configuring security for a DataStax Enterprise cluster using Kerberos. For instructions, see Authenticating a cluster with Kerberos.
kerberos_options: keytab: path_to_keytab/dse.keytab service_principal: dse_user/_HOST@REALM http_principal: HTTP/_HOST@REALM qop: auth
- keytab: resources/dse/conf/dse.keytab
The keytab file must contain the credentials for both of the fully resolved principal names, which replace _HOST with the FQDN of the host in the
service_principal
andhttp_principal
settings. The UNIX user running DSE must also have read permissions on the keytab. - service_principal:
dse_user/_HOST@REALM
The service_principal that the Cassandra and Hadoop processes run under must use the form dse_user/_HOST@REALM.
where
dse_user
is:- Installer-Services and Package installations: cassandra
- Package installations: the name of the UNIX user that starts the service
- _HOST is the broadcast IP address.
- REALM to the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase.
The service_principal must be consistent everywhere: in the dse.yaml, present in the keytab, and in the cqlshrc file (where service_principal is separated into service/hostname).
- http_principal:
HTTP/_HOST@REALM
The http_principal is used by the tomcat application container to run DSE Search/Solr. The web server uses GSS-API mechanism (SPNEGO) to negotiate the GSSAPI security mechanism (Kerberos). Set REALM to the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase.
qop - auth
A comma-delimited list of Quality of Protection values that clients and servers can use for each connection. The client can have multiple QOP values, while the server can have only a single QOP value. The valid values are:- auth - Default: Authentication only.
- auth-int - Authentication plus integrity protection for all transmitted data.
- auth-conf - Authentication plus integrity protection and
encryption of all transmitted data.
Encryption using auth-conf is separate and completely independent of whether encryption is done using SSL. If both auth-conf and SSL are enabled, the transmitted data is encrypted twice. DataStax recommends choosing only one method and using it for both encryption and authentication.
LDAP options
To use these options, you must set com.datastax.bdp.cassandra.auth.LdapAuthenticator as the authenticator in the cassandra.yaml file. For instructions, see Authenticating a cluster with LDAP.
- server_host
- The hostname of the LDAP server. Default: localhost
- server_port
- The port on which the LDAP server listens. Default: 389
- search_dn
- The username of the user that is used to search for other users on the LDAP server.
- search_password
- The password of the
search_dn
user.
- use_ssl
-
Set to
true
to enable SSL connections to the LDAP server. If set totrue
, you may need to changeserver_port
to the SSL port of the LDAP server. Default: false
- use_tls
-
Set to
true
to enable TLS connections to the LDAP server. If set totrue
, you may need to change theserver_port
to the TLS port of the LDAP server. Default: false
- truststore_path
- The path to the trust store for SSL certificates.
- truststore_password
- The password to access the trust store.
- truststore_type
- The type of trust store. Default: jks
- user_search_base
- The search base for your domain, used to look
up users. Set the
ou
anddc
elements for your LDAP domain. Typically this is set toou=users,dc=domain,dc=top level domain
. For example,ou=users,dc=example,dc=com
.
- user_search_filter
- The search filter for looking up usernames. Default: uid={0}
- credentials_validity_in_ms
- The duration period in milliseconds for the credential cache.
- search_validity_in_seconds
- The duration period in milliseconds for the search cache. Default: 0
- connection_pool
-
- max_active
The maximum number of active connections to the LDAP server. Default: 8
- max_idle
The maximum number of idle connections in the pool awaiting requests. Default: 8
- max_active
Scheduler settings for Solr indexes
These settings control the schedulers in charge of querying for and removing expired data.
- ttl_index_rebuild_options
-
- fix_rate_period
Schedules how often to check for expired data in seconds. Default: 300
- initial_delay
Speeds up start-up by delaying the first TTL checks in seconds. Default: 20
- max_docs_per_batch
The maximum number of documents deleted per batch by the TTL rebuild thread. Default: 200
- fix_rate_period
Solr shard transport options
For inter-node communication between Solr nodes. Also see Shard transport options for DSE Search/Solr communications.
- shard_transport_options
-
These options are specific to netty.
- type
Starting in 4.5.0 netty is used for TCP-based communication. It provides lower latency, improved throughput, and reduced resource consumption than http transport, which uses standard a HTTP-based interface for communication. Default: netty
- netty_server_port
The TCP listen port. This setting is mandatory if you either want to use the netty transport now or later migrate to it. To use http transport, either comment out this setting or change it to -1. Default: 8984
- netty_server_acceptor_threads
The number of server acceptor threads. Default: number of available processors
- netty_server_worker_threads
The number of server worker threads. Default: number of available processors * 8
- netty_client_worker_thread
The number of client worker threads. Default: number of available processors * 8
- netty_client_max_connections
The maximum number of client connections. Default: 100
- netty_client_request_timeout
The client request timeout, in milliseconds. Default: 60000
- type
- HTTP transport settings
- The defaults for are the same as Solr, that is 0, meaning no timeout at all. To avoid
blocking operations, DataStax strongly recommends to changing these settings to a finite
value. These settings are valid across Solr cores:
- http_shard_client_conn_timeout
HTTP shard client timeouts in milliseconds. Default: 0
- http_shard_client_socket_timeout
HTTP shard client socket timeouts in milliseconds. Default: 0
- http_shard_client_conn_timeout
Solr indexing
DSE Search provides multi-threaded indexing implementation to improve performance on multi-core machines. All index updates are internally dispatched to a per-core indexing thread pool and executed asynchronously, which allows for greater concurrency and parallelism. However, index requests can return a response before the indexing operation is executed.
- max_solr_concurrency_per_core
- Configures the maximum number of concurrent asynchronous indexing threads per Solr core. If set to 1, DSE Search returns to the synchronous
indexing behavior. Also see Configuring the available indexing threads. Default: number of
available Solr cores * 2
Note: Dynamic switching to Solr concurrency level at 1 is disallowed.
- back_pressure_threshold_per_core
- The total number of queued asynchronous indexing requests per Solr core, computed at Solr commit time. When exceeded, back pressure prevents excessive resources consumption by throttling new incoming requests. Default: 500
- flush_max_time_per_core
- The maximum time, in minutes, to wait before flushing asynchronous index updates, which occurs at either at Solr commit time or at Cassandra flush time. To fully synchronize Solr indexes with Cassandra data, ensure that flushing completes successfully by setting this value to a reasonable high value. Default: 5
- load_max_time_per_core
- The maximum time in minutes wait for each Solr core to load on startup or create/reload operations, expressed. This advanced option should be changed only if any exceptions happen during core loading. Default: 1 (if not specified)
Cassandra disk failure policy
- enable_index_disk_failure_policy
- DSE Search activates the configured Cassandra disk failure policy if IOExceptions occur during index update operations. Default: false
Solr CQL query options
Available options for CQL Solr queries.
- cql_solr_query_executor_threads
- The maximum number of threads for retrieving rows during CQL Solr queries. This value is cross-request and cross-core. Default: number of available processors * 10
- cql_solr_query_row_timeout
- The maximum time in milliseconds to wait for each row to be read from Cassandra during CQL Solr queries. Default: 10000 milliseconds (10 seconds)
CQL Performance Service options
These settings are used by the Performance Service to configure how it collects performance metrics on Cassandra nodes. They are stored in the dse_perf keyspace and can be queried with CQL using any CQL-based utility, such as cqlsh, DataStax DevCenter, or any application using a Cassandra CQL driver.
- cql_slow_log_options
-
Report distributed sub-queries (query executions on individual
shards) that take longer than a specified period of time.
- enabled
Default: false
- cql_slow_log_threshold_ms
Default: 100 milliseconds
- cql_slow_log_ttl
Default: 86400 milliseconds
- async_writers
Default: 1
For detailed information, see Collecting slow queries.
- enabled
- cql_system_info_options
-
CQL system information tables settings
- enabled
Default: false
- refresh_rate_ms
Default: 10000 milliseconds
For detailed information, see Collecting system level diagnostics.
- enabled
- resource_level_latency_tracking_options
- Data resource latency tracking settings:
- enabled
Default: false
- refresh_rate_ms
Default: 10000 milliseconds
For detailed information, see Collecting system level diagnostics.
- enabled
- db_summary_stats_options
-
Database summary statistics settings
- enabled
Default: false
- refresh_rate_ms
Default: 10000 milliseconds
For detailed information, see Collecting database summary diagnostics.
- enabled
- cluster_summary_stats_options
- Cluster summary statistics settings
- enabled
Default: false
- refresh_rate_ms
Default: 10000 milliseconds
For detailed information, see Collecting cluster summary diagnostics.
- enabled
- histogram_data_options
- Column Family Histogram data tables settings
- enabled
Default: false
- refresh_rate_ms
Default: 10000 milliseconds
- retention_count
Default: 3
For detailed information, see Collecting table histogram diagnostics.
- enabled
- user_level_latency_tracking_options
- User-resource latency tracking settings
- enabled
Default: false
- refresh_rate_ms
Default: 10000 milliseconds
- top_stats_limit
Default: 100
For detailed information, see Collecting user activity diagnostics.
- enabled
Solr Performance Service options
These settings are used by the Performance Service.
- solr_indexing_error_log_options
-
- enabled
Default: false
- ttl_seconds
Default: 604800 seconds
- async_writers
Default: 1
For detailed information, see Collecting indexing errors.
- enabled
- solr_slow_sub_query_log_options
-
- enabled
Default: false
- ttl_seconds
Default: 604800 seconds
- async_writers
Default: 1
- threshold_ms
Default: 100
For detailed information, see Collecting slow Solr queries.
- enabled
- solr_update_handler_metrics_options
-
- enabled
Default: false
- ttl_seconds
Default: 604800 seconds
- refresh_rate_ms
Default: 60000 milliseconds
For detailed information, see Collecting handler statistics.
- enabled
- solr_index_stats_options
-
- enabled
Default: false
- ttl_seconds
Default: 604800
- refresh_rate_ms
Default: 60000
For detailed information, see Collecting index statistics.
- enabled
- solr_cache_stats_options
-
- enabled
Default: false
- ttl_seconds
Default: 604800
- refresh_rate_ms
Default: 60000
For detailed information, see Collecting cache statistics.
- enabled
- solr_latency_snapshot_options
-
- enabled
Default: false
- ttl_seconds
Default: 604800 seconds
- refresh_rate_ms
Default: 60000 milliseconds
For detailed information, see Collecting performance statistics.
- enabled
- solr_latency_snapshot_options
-
- enabled
Default: false
- ttl_seconds
Default: 604800 seconds
- refresh_rate_ms
Default: 60000 milliseconds
For detailed information, see Collecting performance statistics.
- enabled
Other encryption settings
Settings for encrypting passwords and sensitive system tables.
- system_key_directory
- The directory where system keys are kept. Keys used for SSTable encryption must be
distributed to all nodes, DataStax Enterprise must be able to read and write to this
directory, and have 700 permissions and belong to the dse user. Default:
/etc/dse/conf
For detailed information, see Encrypting data.
- config_encryption_active
- When set to true (default: false), the following configuration
values must be encrypted: dse.yaml
- ldap_options.search_password
- ldap_options.truststore_password
cassandra.yaml
- server_encryption_options.keystore_password
- server_encryption_options.truststore_password
- client_encryption_options.keystore_password
- client_encryption_options.truststore_password
- ldap_options.truststore_password
- config_encryption_key_name
- The name of the system key for encrypting and decrypting stored passwords in the configuration files. To encrypt keyfiles, use the dsetool createsystemkey. When config_encryption_active is true, you must provide a valid key with this name in the system_key_directory. Default: system_key
- system_info_encryption
- If enabled, system tables that contain sensitive information, such as system.hints,
system.batchlog, and system.paxos, are encrypted. If enabling system table encryption on
a node with existing data, run nodetool upgradesstables -a on the listed tables. When tracing
is enabled, sensitive information is written into the tables in the system_traces
keyspace. Configure those tables to encrypt their data by using an encrypting compressor.
- enabled
Default: false
- cipher_algorithm
Default: AES
- secret_key_strength
Default: 128
- chunk_length_kb
Default: 64
- key_name
The name of the keys file created to encrypt system tables. This file is created in system_key_directory/system/key_name. Default: system_table_keytab
- enabled
- hive_options
- Retries setting when Hive inserts data to Cassandra table.
- insert_max_retries
Maximum number of retries. Default: 6
- insert_retry_sleep_period
Period of time in milliseconds between retries. Default: 50
- insert_max_retries
Audit logging settings
Options for the audit logger. To get the maximum information from data auditing, turn on data auditing on every node. For detailed information, see Configuring and using data auditing and Configuring audit logging to a log4j log file.
- audit_logging_options
-
- enabled
Default: false
- Available loggers:
- CassandraAuditWriter
Logs audit information to a Cassandra table. This logger can be run either synchronously or asynchronously. Audit logs are stored in the dse_audit.audit_log table. When run synchronously, a query will not execute until it has been written to the audit log table successfully. If there is a failure between when an audit event is written and it's query is executed, the audit logs may contain queries that were never executed. Also see Configuring audit logging to a Cassandra table.
- Log4JAuditWriter
Logs audit info to a log4j logger. The logger name is DataAudit, and can be configured in the log4j-server.properties file.
- CassandraAuditWriter
- logger
Default: Log4JAuditWriter
- enabled
- include_categories or exclude_categories
Comma separated list of audit event categories to be included or excluded from the audit log. Categories are: QUERY, DML, DDL, DCL, AUTH, ADMIN. Specify either included or excluded categories. Specifying both is an error.
- included_keyspaces or excluded_keyspaces
Comma separated list of keyspaces to be included or excluded from the audit log. Specify either included or excluded keyspaces. Specifying both is an error.
- retention_time
The amount of time, in hours, that audit events are retained by supporting loggers. Currently, only the CassandraAuditWriter supports retention time. Values of 0 or less retain events forever. Default: 0
- cassandra_audit_writer_options
Sets the mode the writer runs in. When run synchronously, a query is not executed until the audit event is successfully written. When run asynchronously, audit events are queued for writing to the audit table, but are not necessarily logged before the query executes. A pool of writer threads consumes the audit events from the queue, and writes them to the audit table in batch queries. While this substantially improves performance under load, if there is a failure between when a query is executed, and its audit event is written to the table, the audit table may be missing entries for queries that were executed.
- mode
Default: sync
- batch_size (async mode only)
Must be greater than 0. The maximum number of events the writer will dequeue before writing them out to the table. If you're seeing warnings in your logs about batches being too large, decrease this value. Increasing batch_size_warn_threshold_in_kb in cassandra.yaml is also an option. Make sure you understand the implications before doing so. Default: 50
- flush_time (async mode only)
The maximum amount of time in milliseconds before an event is removed from the queue by a writer before being written out. This prevents events from waiting too long before being written to the table when there's not a lot of queries happening. Default: 500
- num_writers (async mode only)
The number of worker threads asynchronously logging events to the CassandraAuditWriter. Default: 10
- queue_size
The size of the queue feeding the asynchronous audit log writer threads. When there are more events being produced than the writers can write out, the queue will fill up, and newer queries will block until there is space on the queue. If a value of 0 is used, the queue size will be unbounded, which can lead to resource exhaustion under heavy query load. Default: 10000
- write_consistency
The consistency level used to write audit events. Default: QUORUM
- mode