dse.yaml configuration file
The DataStax Enterprise configuration file for security, DSE Search, DSE Graph, and DSE Analytics.
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
logback.xml
The location of the logback.xml file depends on the type of installation:Package installations | /etc/dse/cassandra/logback.xml |
Tarball installations | installation_location/resources/cassandra/conf/logback.xml |
cassandra.yaml
The location of the cassandra.yaml file depends on the type of installation:Package installations | /etc/dse/cassandra/cassandra.yaml |
Tarball installations | installation_location/resources/cassandra/conf/cassandra.yaml |
The cassandra.yaml file is the primary configuration file for the DataStax Enterprise database.
Syntax
node_health_options: refresh_rate_ms: 60000 uptime_ramp_up_period_seconds: 10800 dropped_mutation_window_minutes: 30
Security and authentication
Authentication options
DSE Authenticator supports multiple schemes for authentication at the same time in a DataStax Enterprise cluster. Additional authenticator configuration is required in cassandra.yaml.# authentication_options:
# enabled: false
# default_scheme: internal
# other_schemes:
# scheme_permissions: false
# allow_digest_with_kerberos: true
# plain_text_without_ssl: warn
# transitional_mode: disabled
- authentication_options
- Configures DseAuthenticator to authenticate users when the authenticator option in
cassandra.yaml is set to
com.datastax.bdp.cassandra.auth.DseAuthenticator
. Authenticators other than DseAuthenticator are not supported. - enabled
- Enables user authentication.
- true - The DseAuthenticator authenticates users.
- false - The DseAuthenticator does not authenticate users and allows all connections.
Default: false
- default_scheme
- The first scheme to validate a user against when the driver does not request a
specific scheme.
- internal - Plain text authentication using the internal password authentication.
- ldap - Plain text authentication using pass-through LDAP authentication.
- kerberos - GSSAPI authentication using the Kerberos authenticator.
Default: internal
- other_schemes
- List of schemes that are checked if validation against
the first scheme fails and no scheme was specified by the driver.
- ldap - Plain text authentication using pass-through LDAP authentication.
- kerberos - GSSAPI authentication using the Kerberos authenticator.
Default: none
- scheme_permissions
- Determines if roles need to have permission granted to them to use specific
authentication schemes. These permissions can be granted only when the DseAuthorizer
is used.
- true - Use multiple schemes for authentication. To be assigned, every role requires permissions to a scheme.
- false - Do not use multiple schemes for authentication. Prevents unintentional role assignment that might occur if user or group names overlap in the authentication service.
Tip: See .Default: false
- allow_digest_with_kerberos
- Controls whether DIGEST-MD5 authentication is allowed with Kerberos. Kerberos uses
DIGEST-MD5 to pass credentials between nodes and jobs. The DIGEST-MD5 mechanism is
not associated directly with an authentication scheme.
- true - Allow DIGEST-MD5 authentication with Kerberos. In analytics clusters,
set to
true
to use Hadoop internode authentication with Hadoop and Spark jobs. - false - Do not allow DIGEST-MD5 authentication with Kerberos.
Default: true
- true - Allow DIGEST-MD5 authentication with Kerberos. In analytics clusters,
set to
- plain_text_without_ssl
- Controls how the DseAuthenticator responds to plain text authentication requests
over unencrypted client connections.
- block - Block the request with an authentication error.
- warn - Log a warning but allow the request.
- allow - Allow the request without any warning.
Default: warn
- transitional_mode
- Sets transitional mode for temporary use during authentication setup in an
established environment. Transitional mode allows access to the database using the
anonymous
role, which has all permissions exceptAUTHORIZE
.- disabled - Disable transitional mode. All connections must provide valid credentials and map to a login-enabled role.
- permissive - Only super users are authenticated and logged in. All other authentication attempts are logged in as the anonymous user.
- normal - Allow all connections that provide credentials. Maps all authenticated
users to their role, and maps all other connections to
anonymous
. - strict - Allow only authenticated connections that map to a login-enabled role
OR connections that provide a blank username and password as
anonymous
.
Important: Credentials are required for all connections after authentication is enabled; use a blank username and password to login with anonymous role in transitional mode.Default: disabled
Role management options
#role_management_options: # mode: internal
- role_management_options
- Configures the DSE Role Manager. To enable role manager, set:
- authorization_options enabled to true
- role_manager in
cassandra.yaml to
com.datastax.bdp.cassandra.auth.DseRoleManager
Tip: See .When scheme_permissions is enabled, all roles must have permission to execute on the authentication scheme. See . - mode
- Manages granting and revoking of roles.
- internal - Manage granting and revoking of roles internally using the GRANT ROLE and REVOKE ROLE CQL statements. See . Internal role management allows nesting roles for permission management.
- ldap - Manage granting and revoking of roles using an external LDAP server configured using the ldap_options. To configure an LDAP scheme, complete the steps in . Nesting roles for permission management is disabled.
Default: internal
- stats
- Set to true, to enable logging of DSE role creation and modification events in
the
dse_security.role_stats
system table. All nodes must have the stats option enabled, and must be restarted for the functionality to take effect.
Authorization options
#authorization_options: # enabled: false # transitional_mode: disabled # allow_row_level_security: false
- authorization_options
- Configures the DSE Authorizer to authorize users when the authorization option
in cassandra.yaml is set to
com.datastax.bdp.cassandra.auth.DseAuthorizer
. - enabled
- Enables the DSE Authorizer for role-based access control (RBAC).
- true - Enable the DSE Authorizer for RBAC.
- false - Do not use the DSE Authorizer.
Default: false
- transitional_mode
- Allows the DSE Authorizer to operate in a temporary mode during
authorization setup in a cluster.
- disabled - Transitional mode is disabled.
- normal - Permissions can be passed to resources, but are not enforced.
- strict - Permissions can be passed to resources, and are enforced on authenticated users. Permissions are not enforced against anonymous users.
Default: disabled
- allow_row_level_security
- Enables row-level access control (RLAC) permissions. Use the same setting on
all nodes. See .
- true - Use row-level security.
- false - Do not use row-level security.
Default: false
Kerberos options
kerberos_options: keytab: resources/dse/conf/dse.keytab service_principal: dse/_HOST@REALM http_principal: HTTP/_HOST@REALM qop: auth
- kerberos_options
- Configures security for a DataStax Enterprise cluster using Kerberos.
- keytab
- The filepath of dse.keytab.
Default: resources/dse/conf/dse.keytab
- service_principal
- The service_principal that the DataStax Enterprise process runs under must use
the form dse_user/_HOST@REALM, where:
- dse_user is the username of the user that starts the DataStax Enterprise process.
- _HOST is converted to a reverse DNS lookup of the broadcast address.
- REALM is the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase.
Default: dse/_HOST@REALM
- http_principal
- Used by the Tomcat application container to run DSE Search. The Tomcat
web server uses the GSSAPI mechanism (SPNEGO) to negotiate the GSSAPI
security mechanism (Kerberos). REALM is the name of
your Kerberos realm. In the Kerberos principal, REALM
must be uppercase.
Default: HTTP/_HOST@REALM
- qop
- A comma-delimited list of Quality of Protection (QOP) values that
clients and servers can use for each connection. The client can have multiple
QOP values, while the server can have only a single QOP value.
- auth - Authentication only.
- auth-int - Authentication plus integrity protection for all transmitted data.
- auth-conf - Authentication plus integrity protection and encryption of all
transmitted data.
Encryption using
auth-conf
is separate and independent of whether encryption is done using SSL. If both auth-conf and SSL are enabled, the transmitted data is encrypted twice. DataStax recommends choosing only one method and using it for encryption and authentication.
Default: auth
LDAP options
ldap_options.server_port
parameter is used by
default. This way, there is no change in configuration for existing users who have
LDAP configured. - A connection pool is created of each server separately. Once the connection is attempted, the best pool is chosen using a heuristic. DSE uses a circuit breaker to temporarily disable those servers that frequently fail to connect. Also, DSE tries to choose the pool that has the greatest number of idle connections.
- Failover parameters are configured through system properties.
- A new method was added in DSE 6.7.9 to the LDAP MBean to reset LDAP connectors - that is, close all connection pools and recreate them.
# ldap_options: # server_host: # server_port: 389 # hostname_verification: false # search_dn: # search_password: # use_ssl: false # use_tls: false # truststore_path: # truststore_password: # truststore_type: jks # user_search_base: # user_search_filter: (uid={0}) # user_memberof_attribute: memberof # extra_user_search_bases: # group_search_type: directory_search # group_search_base: # group_search_filter: (uniquemember={0}) # group_name_attribute: cn # extra_group_search_bases: # credentials_validity_in_ms: 0 # search_validity_in_seconds: 0 # connection_pool: # max_active: 8 # max_idle: 8
ldap_options: server_host: win2012ad_server.mycompany.lan server_port: 389 search_dn: cn=lookup_user,cn=users,dc=win2012domain,dc=mycompany,dc=lan search_password: lookup_user_password use_ssl: false use_tls: false truststore_path: truststore_password: truststore_type: jks #group_search_type: directory_search group_search_type: memberof_search #group_search_base: #group_search_filter: group_name_attribute: cn user_search_base: cn=users,dc=win2012domain,dc=mycompany,dc=lan user_search_filter: (sAMAccountName={0}) user_memberof_attribute: memberOf connection_pool: max_active: 8 max_idle: 8
- ldap_options
- Configures LDAP security when the authenticator option in
cassandra.yaml is set to
com.datastax.bdp.cassandra.auth.DseAuthenticator
. - server_host
- A comma separated list of LDAP server hosts. Important: Do not use LDAP on the same host (localhost) in production environments. Using LDAP on the same host (localhost) is appropriate only in single node test or development environments.
For information on parameters related to tuning failover performance for multiple LDAP servers, see Tune LDAP failover.
Default: none
- server_port
- The port on which the LDAP server listens.
- 389 - The default port for unencrypted connections.
- 636 - Used for encrypted connections. Default SSL or TLS port for LDAP.
Default: 389
- hostname_verification
- Enable hostname verification. The following conditions must be met:
- Either
use_ssl
oruse_tls
must be set totrue
. - A valid truststore with the correct path specified in
truststore_path
must exist. The truststore must have a certificate entry,trustedCertEntry
, including a SANDNSName
entry that matches the hostname of the LDAP server.
Default: false
- Either
- search_dn
- Distinguished name (DN) of an account with read access to the
user_search_base
andgroup_search_base
. For example:- OpenLDAP:
uid=lookup,ou=users,dc=springsource,dc=com
- Microsoft Active Directory (AD):
cn=lookup, cn=users, dc=springsource, dc=com
Warning: Do not create/use an LDAP account or group calledWhen not set, the LDAP server uses an anonymous bind for search.cassandra
. The DSE database comes with a defaultcassandra
login role that has access to all database objects and uses the consistency level QUOROM.Default: commented out
- OpenLDAP:
- search_password
- The password of the
search_dn
account.Default: commented out
- use_ssl
- Enables an SSL-encrypted connection to the LDAP server. Tip: See .
- true - Use an SSL-encrypted connection.
- false - Do not enable SSL connections to the LDAP server.
Default: false
- use_tls
- Enables TLS connections to the LDAP server.
- true - Enable TLS connections to the LDAP server.
- false - Do not enable TLS connections to the LDAP server
Default: false
- truststore_path
- The filepath to the SSL certificates
truststore.
Default: commented out
- truststore_password
- The password to access the truststore.
Default: commented out
- truststore_type
- Valid types are JKS, JCEKS, or PKCS12.
Default: jks
- user_search_base
- Distinguished name (DN) of the object to start the
recursive search for user entries for authentication and role management
memberof searches.
- For your LDAP domain, set the
ou
anddc
elements. Typically set toou=users,dc=domain,dc=top_level_domain
. For example,ou=users,dc=example,dc=com
. - For your Active Directory, set the
dc
element for a different search base. Typically set toCN=search,CN=Users,DC=ActDir_domname,DC=internal
. For example,CN=search,CN=Users,DC=example-sales,DC=internal
.
Default: none
- For your LDAP domain, set the
- user_search_filter
- Identifies the user that the search filter uses for looking up usernames.
- uid={0} - When using LDAP.
- samAccountName={0} - When using AD (Microsoft Active Directory). For
example,
(sAMAccountName={0})
.
Default: uid={0}
- user_memberof_attribute
- Contains a list of group names. Role manager assigns DSE roles that exactly
match any group name in the list. Required when managing roles using
group_search_type: memberof_search
with LDAP (role_manager.mode:ldap). The directory server must have memberof support, which is a default user attribute in Microsoft Active Directory (AD).Default: memberof
- extra_user_search_bases
- Option to define additional search bases for users. If the user is not found in
one search base, DSE attempts to find the user in another search base, until all
search bases have been tried. See also
user_search_base
,group_search_base
, andextra_group_search_bases
.Default: [] (empty list)
- group_search_type
- Defines how group membership is determined for a user. Required when managing
roles with LDAP (role_manager.mode: ldap).
- directory_search - Filters the results with a subtree search of group_search_base to find groups that contain the username in the attribute defined in the group_search_filter.
- memberof_search - Recursively searches for user entries using the
user_search_base
anduser_search_filter
. Gets groups from the user attribute defined inuser_memberof_attribute
. The directory server must have memberof support.
Default: directory_search
- group_search_base
- The unique distinguished name (DN) of the group
record from which to start the group membership search.
Default: commented out
- group_search_filter
- Set to any valid LDAP filter.
Default: uniquemember={0}
- group_name_attribute
- The attribute in the group record that contains the LDAP group name. Role names
are case-sensitive and must match exactly on DSE for assignment. Unmatched groups
are ignored.
Default: cn
- extra_group_search_bases
- Option to define additional search bases for groups. DSE merges all groups found
in all the defined search bases. See also
group_search_base
,user_search_base
, andextra_user_search_bases
.Default: [] (empty list)
- credentials_validity_in_ms
- A credentials cache improves performance by reducing the number of requests that
are sent to the internal or LDAP server. See .
- 0 - Disable credentials cache.
- duration period - The duration period in milliseconds of the credentials cache.
Note: Starting in DSE 6.7.9, the upper limit forldap_options.credentials_validity_in_ms
increased to 864,000,000 ms, which is 10 days.Default: 0
- search_validity_in_seconds
- Configures a search cache to improve performance by
reducing the number of requests that are sent to the internal or LDAP
server.
- 0 - Disables search credentials cache.
- positive number - The duration period in seconds for the search cache.
Note: Starting in DSE 6.7.9, the upper limit forldap_options.credentials_validity_in_seconds
increased to 864,000 seconds, which is 10 days.Default: 0
- connection_pool
- Configures the connection pool for making LDAP requests.
- max_active
- The maximum number of active connections to the LDAP
server.
Default: 8
- max_idle
- The maximum number of idle connections in the pool
awaiting requests.
Default: 8
Encrypt sensitive system resources
Options to encrypt sensitive system resources using a local encryption key or a remote KMIP key.
system_info_encryption: enabled: false cipher_algorithm: AES secret_key_strength: 128 chunk_length_kb: 64 key_provider: KmipKeyProviderFactory kmip_host: kmip_host_name
- system_info_encryption
- Sets the encryption settings for system resources that might contain sensitive
information, including the
system.batchlog
andsystem.paxos
tables, hint files, and the database commit log. - enabled
- Enables encryption of system resources. See .
- true - Enable encryption of system resources.
- false - Does not encryption of system resources.
Note: TheDefault: falsesystem_trace
keyspace is not encrypted by enabling thesystem_information_encryption
section. In environments that also have tracing enabled, manually configure encryption with compression on thesystem_trace
keyspace. See . - cipher_algorithm
- The name of the JCE cipher algorithm used to encrypt system resources.
Default: AESTable 1. Supported cipher algorithms names cipher_algorithm secret_key_strength AES 128, 192, or 256 DES 56 DESede 112 or 168 Blowfish 32-448 RC2 40-128 - secret_key_strength
- Length of key to use for the system resources. See Supported cipher algorithms names.Note: DSE uses a matching local key or requests the key type from the KMIP server. For KMIP, if an existing key does not match, the KMIP server automatically generates a new key.Default: 128
- chunk_length_kb
- Optional. Size of SSTable chunks when data from the system.batchlog or
system.paxos are written to disk. Note: To encrypt existing data, runDefault: 64
nodetool upgradesstables -a system batchlog paxos
on all nodes in the cluster. - key_provider
- KMIP key provider to enable encrypting sensitive system data with a KMIP key.
Comment out if using a local encryption key.
Default: KmipKeyProviderFactory
- kmip_host
- The KMIP key server host. Set to the kmip_group_name that
defines the KMIP host in kmip_hosts
section. DSE requests a key from the KMIP host and uses the key generated by the
KMIP provider.
Default: kmip_host_name
Encrypted configuration properties
system_key_directory: /etc/dse/conf config_encryption_active: false config_encryption_key_name: (key_filename | KMIP_key_URL )
- system_key_directory
- Path to the directory where local encryption key files are stored, also called
system keys. Distributes the system keys to all nodes in the cluster. Ensure the
DSE account is the folder owner and has read/write/execute (700) permissions.
See .Note: This directory is not used for KMIP keys.
Default: /etc/dse/conf
- config_encryption_active
- Enables encryption on sensitive data stored in tables and in configuration
files.
- true - Enable encryption of configuration property values using the
specified config_encryption_key_name. When set
to true, the configuration values must be encrypted or commented out. See
.Restriction: Lifecycle Manager (LCM) is not compatible when
config_encryption_active
istrue
in DSE and OpsCenter. For LCM limitations, see . - false - Do not enable encryption of configuration property values.
Default: false
- true - Enable encryption of configuration property values using the
specified config_encryption_key_name. When set
to true, the configuration values must be encrypted or commented out. See
.
- config_encryption_key_name
- The local encryption key filename or KMIP key URL to use for configuration file
property value decryption. Note: Use dsetool encryptconfigvalue to generate encrypted values for the configuration file properties.Default: system_keyNote: The default name is not configurable.
KMIP encryption options
kmip_hosts: your_kmip_groupname: hosts: kmip1.yourdomain.com, kmip2.yourdomain.com keystore_path: pathto/kmip/keystore.jks keystore_type: jks keystore_password: password truststore_path: pathto/kmip/truststore.jks truststore_type: jks truststore_password: password key_cache_millis: 300000 timeout: 1000
- kmip_hosts
- Configures connections for key servers that support the KMIP protocol.
- kmip_groupname
- A user-defined name for a group of options to configure a KMIP server or servers, key settings, and certificates. For each KMIP key server or group of KMIP key servers, you must configure options for a kmip_groupname section. Using separate key server configuration settings allows use of different key servers to encrypt table data and eliminates the need to enter key server configuration information in Data Definition Language (DDL) statements and other configurations. DDL statements are database schema change commands like CREATE TABLE. Multiple KMIP hosts are supported.
- hosts
- A comma-separated list of KMIP hosts
(host[:port]) using the FQDN (Fully
Qualified Domain Name). Add KMIP hosts in the intended failover sequence because
DSE queries the host in the listed order.
For example, if the host list contains
kmip1.yourdomain.com, kmip2.yourdomain.com
, DSE trieskmip1.yourdomain.com
and thenkmip2.yourdomain.com
. - keystore_path
- The path to a Java keystore created from the KMIP agent PEM files.
- keystore_type
- Valid types are JKS, JCEKS, PKCS11, and PKCS12. For
file-based keystores, use PKCS12.Attention: DataStax supports PKCS11 as a
keystore_type
on nodes withcassandra
oradvanced
workloads. Thecassandra
workload support is specific to DSE 6.7.7 and later releases. Theadvanced
workload support is specific to DSE 6.7.9 and later. If PKCS11 is needed, inserver_encryption_options
orclient_encryption_options
, specify thekeystore_type
asPKCS11
and thekeystore
asNONE
. PKCS11 is not supported in DSE 6.0.x and 5.1.x releases. PKCS11 is not supported as atruststore_type.
Default: JKS
- keystore_password
- Password used to protect the private key of the key
pair.
Default: none
- truststore_path
- The path to a Java truststore that was created using the KMIP root certificate.
Default: /etc/dse/conf/KMIP_truststore.jks
- truststore_type
- Valid types are JKS, JCEKS, PKCS12. For file-based truststores, use
PKCS12.Attention: Due to an OpenSSL issue, you cannot use a PKCS12 truststore that was generated via OpenSSL. For example, a truststore generated via the following command will not work with DSE:
openssl pkcs12 -export -nokeys -out truststore.pfx -in intermediate.chain.pem
However, truststores generated via Java'skeytool
and then converted to PKCS12 work with DSE. Example:keytool -importcert -alias rootca -file rootca.pem -keystore truststore.jks
keytool -importcert -alias intermediate -file intermediate.pem -keystore truststore.jks
keytool -importkeystore -srckeystore truststore.jks -destkeystore truststore.pfx -deststoretype pkcs12
Default: JKS
- truststore_password
- Password required to access the keystore.
Default: none
- key_cache_millis
- Milliseconds to locally cache the encryption keys that are read from the KMIP
hosts. The longer the encryption keys are cached, the fewer requests to the KMIP
key server are made and the longer it takes for changes, like revocation, to
propagate to the DSE node. DataStax Enterprise uses concurrent encryption, so
multiple threads fetch the secret key from the KMIP key server at the same time.
DataStax recommends using the default value.
Default: 300000
- timeout
- Socket timeout in milliseconds.
Default: 1000
DSE Search index encryption
# solr_encryption_options: # decryption_cache_offheap_allocation: true # decryption_cache_size_in_mb: 256
- solr_encryption_options
- Tunes encryption of search indexes.
- decryption_cache_offheap_allocation
- Allocates shared DSE Search decryption cache off JVM heap.
- true - Allocate shared DSE Search decryption cache off JVM heap.
- false - Do not allocate shared DSE Search decryption cache off JVM heap.
Default: true
- decryption_cache_size_in_mb
- The maximum size of the shared DSE Search decryption cache in megabytes (MB).
Default: 256
DSE In-Memory options
To use DSE In-Memory, specify how much system memory to use for all in-memory tables by fraction or size.
# max_memory_to_lock_fraction: 0.20 # max_memory_to_lock_mb: 10240
- max_memory_to_lock_fraction
- A fraction of the system memory. For example, 0.20 allows use up to 20% of system
memory. This setting is ignored if
max_memory_to_lock_mb
is set to a non-zero value.Default: 0.20
- max_memory_to_lock_mb
- Maximum amount of memory in megabytes (MB) for DSE In-Memory tables.
- not set - Use the fraction specified with
max_memory_to_lock_fraction
. - number greater than 0 - Maximum amount of memory in megabytes (MB).
Default: 10240
- not set - Use the fraction specified with
Node health options
node_health_options: refresh_rate_ms: 60000 uptime_ramp_up_period_seconds: 10800 dropped_mutation_window_minutes: 30
- node_health_options
- Node health options are always enabled. Node health is a score-based representation of how healthy a node is to handle search queries. See Collecting node health and indexing status scores.
- refresh_rate_ms
- How frequently statistics update.
Default: 60000
- uptime_ramp_up_period_seconds
- The amount of continuous uptime required for the node's uptime score to advance the
node health score from 0 to 1 (full health),
assuming there are no recent dropped mutations. The health score is a composite score
based on dropped mutations and uptime. Tip: If a node is repairing after a period of downtime, increase the uptime period to the expected repair time.
Default: 10800 (3 hours)
- dropped_mutation_window_minutes
- The historic time window over which the rate of dropped mutations affects the node
health score.
Default: 30
Health-based routing
enable_health_based_routing: true
- enable_health_based_routing
- Enables node health as a consideration for replication selection for distributed DSE
Search queries. Health-based routing enables a trade-off between index consistency and
query throughput.
- true - Consider node health when multiple candidates exist for a particular token range.
- false - Ignore node health for replication selection. When the primary concern is performance, do not enable health-based routing.
Default: true
Lease metrics
lease_metrics_options: enabled: false ttl_seconds: 604800
- lease_metrics_options
- Lease holder statistics help monitor the lease subsystem for automatic management of Job Tracker and Spark Master nodes.
- enabled
- Enables log entries related to lease holders.
- true - Enable log entries related to lease holders to help monitor performance of the lease subsystem.
- false - No not enable log entries.
Default: false
- ttl_seconds
- Time interval in milliseconds to persist the log of lease holder changes.
Default: 604800
DSE Search
Scheduler settings for DSE Search indexes
To ensure that records with time-to-live (TTL) are purged from search indexes when they expire, the search indexes are periodically checked for expired documents.ttl_index_rebuild_options: fixed_rate_period: 300 initial_delay: 20 max_docs_per_batch: 4096 thread_pool_size: 1
- ttl_index_rebuild_options
- Configures the schedulers in charge of querying for expired records, removing expired records, and the execution of the checks.
- fix_rate_period
- Time interval in seconds to check for expired data in seconds.
Default: 300
- initial_delay
- The number of seconds to delay the first TTL check to speed up start-up time.
Default: 20
- max_docs_per_batch
- The maximum number of documents to check and delete per batch by the TTL rebuild
thread. All expired documents are deleted from the index during each check. To
avoid memory pressure, their unique keys are retrieved and then deletes are issued
in batches.
Default: 4096
- thread_pool_size
- The maximum number of search indexes (cores) that can execute TTL cleanup
concurrently. Manages system resource consumption and prevents many search cores
from executing simultaneous TTL deletes.
Default: 1
Reindexing of bootstrapped data
async_bootstrap_reindex: false
- async_bootstrap_reindex
- For DSE Search, configure whether to asynchronously reindex bootstrapped data.
- true - The node joins the ring immediately after bootstrap and reindexing occurs asynchronously. Do not wait for post-bootstrap reindexing so that the node is not marked down. The dsetool ring command can be used to check the status of the reindexing.
- false - The node joins the ring after reindexing the bootstrapped data.
Default: false
CQL Solr paging
cql_solr_query_paging: off
- cql_solr_query_paging
-
- driver - Respects driver paging settings. Uses Solr pagination (cursors) only when the driver uses pagination. Enabled automatically for DSE SearchAnalytics workloads.
- off - Paging is off. Ignore driver paging settings for CQL queries and use normal Solr paging unless:
- The current workload is an analytics workload, including SearchAnalytics. SearchAnalytics nodes always use driver paging settings.
- The cqlsh query parameter paging is set to driver.
Even when
cql_solr_query_paging: off
, paging is dynamically enabled with the"paging":"driver"
parameter in JSON queries.
Default: off
Solr CQL query option
cql_solr_query_row_timeout: 10000
- cql_solr_query_row_timeout
- The maximum time in milliseconds to wait for all rows to be read from the database
during CQL Solr queries.
Default: 10000 (10 seconds)
DSE Search resource upload limit
solr_resource_upload_limit_mb: 10
- solr_resource_upload_limit_mb
- Configures the maximum file size of the search index config or schema. Resource
files can be uploaded, but the search index config and schema are stored
internally in the database after upload.
- 0 - Disable resource uploading.
- upload size - The maximum upload size limit in megabytes (MB) for a DSE Search resource file (search index config or schema).
Default: 10
Shard transport
shard_transport_options: netty_client_request_timeout: 60000
- shard_transport_options
- Fault tolerance option for internode communication between DSE Search nodes.
- netty_client_request_timeout
- Timeout behavior during distributed queries. The internal timeout for all search
queries to prevent long running queries. The client request timeout is the
maximum cumulative time (in milliseconds) that a distributed search request will
wait idly for shard responses.
Default: 60000 (1 minute)
DSE Search indexing
# back_pressure_threshold_per_core: 1024 # flush_max_time_per_core: 5 # load_max_time_per_core: 5 # enable_index_disk_failure_policy: false # solr_data_dir: /MyDir # solr_field_cache_enabled: false # ram_buffer_heap_space_in_mb: 1024 # ram_buffer_offheap_space_in_mb: 1024
- back_pressure_threshold_per_core
- The maximum number of queued partitions during search index rebuilding and
reindexing. This maximum number safeguards against excessive heap use by the
indexing queue. If set lower than the number of threads per core (TPC), not all
TPC threads can be actively indexing.
Default: 1024
- flush_max_time_per_core
- The maximum time, in minutes, to wait for the flushing of asynchronous index updates
that occurs at DSE Search commit time or at flush time. CAUTION: Expert knowledge is required to change this value.Always set the wait time high enough to ensure flushing completes successfully to fully sync DSE Search indexes with the database data. If the wait time is exceeded, index updates are only partially committed and the commit log is not truncated which can undermine data durability.Note: When a timeout occurs, this node is typically overloaded and cannot flush in a timely manner. Live indexing increases the time to flush asynchronous index updates.
Default: 5
- load_max_time_per_core
- The maximum time, in minutes, to wait for each DSE Search index to load on startup
or create/reload operations. This advanced option should be changed only if
exceptions happen during search index loading.
Default: 5
- enable_index_disk_failure_policy
- Whether to apply the configured disk failure policy if IOExceptions occur during
index update operations.
- true - Apply the configured Cassandra disk failure policy to index write failures
- false - Do not apply the disk failure policy
Default: false
- solr_data_dir
- The directory to store index data. See Managing the location of DSE Search data. By
default, each DSE Search index is saved in
solr_data_dir/keyspace_name.table_name
or as specified by the
dse.solr.data.dir
system property.Default: A solr.data directory in the cassandra data directory, like /var/lib/cassandra/solr.data
- solr_field_cache_enabled
- The Apache Lucene® field cache is deprecated. Instead, for fields
that are sorted, faceted, or grouped by, set
docValues="true"
on the field in the search index schema. Then reload the search index and reindex.Default: false
- ram_buffer_heap_space_in_mb
- Global Lucene RAM buffer usage threshold for heap to force segment flush. Setting
too low can cause a state of constant flushing during periods of ongoing write
activity. For near-real-time (NRT) indexing, forced segment flushes also
de-schedule pending auto-soft commits to avoid potentially flushing too many small
segments.
Default: 1024
- ram_buffer_offheap_space_in_mb
- Global Lucene RAM buffer usage threshold for offheap to force segment flush. Setting
too low can cause a state of constant flushing during periods of ongoing write
activity. For NRT, forced segment flushes also de-schedule pending auto-soft
commits to avoid potentially flushing too many small segments. When not set, the
default is 1024.
Default: 1024
Performance Service
- configDseYaml.html#configDseYaml__global-perf-optionsGlobal Performance Service
- configDseYaml.html#configDseYaml__cql-perform-opsPerformance Service
- configDseYaml.html#configDseYaml__solr-cql-queryDSE Search Performance Service
- configDseYaml.html#configDseYaml__sparkPerformanceSpark Performance Service
Global Performance Service
performance_max_threads
+
performance_queue_capacity
. When a task is dropped, collected
statistics might not be current. # performance_core_threads: 4 # performance_max_threads: 32 # performance_queue_capacity: 32000
- performance_core_threads
- Number of background threads used by the performance service under normal
conditions.
Default: 4
- performance_max_threads
- Maximum number of background threads used by the performance service.
Default: 32
- performance_queue_capacity
- Allowed number of queued tasks in the backlog when the number of
performance_max_threads
are busy.Default: 32000
Performance Service
Configures the collection of performance metrics on transactional nodes. Performance
metrics are stored in the dse_perf
keyspace and can be queried using
any CQL-based utility, such as cqlsh or any
application using a CQL driver. To temporarily make changes for diagnostics and
testing, use the dsetool perf subcommands.
graph_events:
ttl_seconds: 600
- graph_events
- Graph event information.
- ttl_seconds
- Number of seconds a record survives before it is expired.
Default: 600
# cql_slow_log_options:
# enabled: true
# threshold: 200.0
# minimum_samples: 100
# ttl_seconds: 259200
# skip_writing_to_db: true
# num_slowest_queries: 5
- cql_slow_log_options
- Configures reporting distributed sub-queries for search (query executions on individual shards) that take longer than a specified period of time.
- enabled
-
- true - Enables log entries for slow queries.
- false - Does not enable log entries.
Default: true
- threshold
- The threshold in milliseconds or as a percentile.
- A value greater than 1 is expressed in time and will log queries that take longer than the specified number of milliseconds. For example, 200.0 sets the threshold at 0.2 seconds.
- A value of 0 to 1 is expressed as a percentile and will log queries that exceed this percentile. For example, .95 collects information on 5% of the slowest queries.
Default: 200.0
- minimum_samples
- The initial number of queries before activating the percentile filter.
Default: commented out (
100
) - ttl_seconds
- Number of seconds a slow log record survives before it is expired.
Default: 259200
- skip_writing_to_db
- Keeps slow queries only in-memory and does not write data to database.
- true - Keep slow queries only in-memory. Skip writing to database.
- false - Write slow query information in the
node_slow_log
table. The threshold must be >= 2000 ms to prevent a high load on the database.
Default: commented out (
true
) - num_slowest_queries
- The number of slow queries to keep in-memory.
Default: commented out (
5
)
cql_system_info_options: enabled: false refresh_rate_ms: 10000
- cql_system_info_options
- Configures collection of system-wide performance information about a cluster.
- enabled
- Enables collection of system-wide performance information about a cluster.
- true - Collect metrics.
- false - Do not collect metrics.
Default: false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the performance statistics.
Default: 10000 (10 seconds)
resource_level_latency_tracking_options: enabled: false refresh_rate_ms: 10000
- resource_level_latency_tracking_options
- Configures collection of object I/O performance statistics.Tip: See Collecting system level diagnostics.
- enabled
- Enables collection of object input output performance statistics.
- true - Collect metrics.
- false - Do not collect metrics.
Default:
false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the performance statistics.
Default: 10000 (10 seconds)
db_summary_stats_options: enabled: false refresh_rate_ms: 10000
- db_summary_stats_options
- Configures collection of summary statistics at the database level.Tip: See Collecting database summary diagnostics.
- enabled
- Enables collection of database summary performance information.
- true - Collect metrics.
- false - Do not collect metrics.
Default:
false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the performance statistics.
Default: 10000 (10 seconds)
cluster_summary_stats_options: enabled: false refresh_rate_ms: 10000
- cluster_summary_stats_options
- Configures collection of statistics at a cluster-wide level. Tip: See Collecting cluster summary diagnostics.
- enabled
- Enables collection of statistics at a cluster-wide level.
- true - Collect metrics.
- false - Do not collect metrics.
Default:
false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the performance statistics.
Default: 10000 (10 seconds)
- spark_cluster_info_options
- Configures collection of data associated with Spark cluster and Spark
applications.
spark_cluster_info_options: enabled: false refresh_rate_ms: 10000
- enabled
- Enables collection of Spark performance statistics.
- true - Collect metrics.
- false - Do not collect metrics.
Default:
false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the performance statistics.
Default: 10000 (10 seconds)
histogram_data_options: enabled: false refresh_rate_ms: 10000 retention_count: 3
- histogram_data_options
- Histogram data for the dropped mutation metrics are stored in the
dropped_messages table in the dse_perf keyspace.Tip: See Collecting histogram diagnostics.
- enabled
-
- true - Collect metrics.
- false - Do not collect metrics.
Default:
false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the performance statistics.
Default: 10000 (10 seconds)
- retention_count
- Default: 3
user_level_latency_tracking_options: enabled: false refresh_rate_ms: 10000 top_stats_limit: 100 quantiles: false
- user_level_latency_tracking_options
- User-resource latency tracking settings. Tip: See Collecting user activity diagnostics.
- enabled
-
- true - Collect metrics.
- false - Do not collect metrics.
Default:
false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the performance statistics.
Default: 10000 (10 seconds)
- top_stats_limit
- The maximum number of individual metrics.
Default: 100
- quantiles
-
Default: false
DSE Search Performance Service
solr_slow_sub_query_log_options: enabled: false ttl_seconds: 604800 async_writers: 1 threshold_ms: 3000
- solr_slow_sub_query_log_options
- See Collecting slow search queries.
- enabled
-
- true - Collect metrics.
- false - Do not collect metrics.
Default:
false
- ttl_seconds
- The number of seconds a record survives before it is
expired.
Default:
604800
(about 10 minutes) - async_writers
- The number of server threads dedicated to writing in the log. More than one server
thread might degrade performance.
Default:
1
- threshold_ms
-
Default:
3000
solr_update_handler_metrics_options: enabled: false ttl_seconds: 604800 refresh_rate_ms: 60000
- solr_update_handler_metrics_options
- Options to collect search index direct update handler statistics over time.Tip: See Collecting handler statistics.
solr_request_handler_metrics_options: enabled: false ttl_seconds: 604800 refresh_rate_ms: 60000
- solr_request_handler_metrics_options
- Options to collect search index request handler statistics over time.Tip: See Collecting handler statistics.
solr_index_stats_options: enabled: false ttl_seconds: 604800 refresh_rate_ms: 60000
- solr_index_stats_options
- Options to record search index statistics over time.Tip: See Collecting index statistics.
solr_cache_stats_options: enabled: false ttl_seconds: 604800 refresh_rate_ms: 60000
- solr_cache_stats_options
- See Collecting cache statistics.
solr_latency_snapshot_options: enabled: false ttl_seconds: 604800 refresh_rate_ms: 60000
- solr_latency_snapshot_options
- See Collecting Apache Solr performance statistics.
Spark Performance Service
spark_application_info_options: enabled: false refresh_rate_ms: 10000 driver: sink: false connectorSource: false jvmSource: false stateSource: false executor: sink: false connectorSource: false jvmSource: false
- spark_application_info_options
- Collection of Spark application metrics.
- enabled
-
- true - Collect metrics.
- false - Do not collect metrics.
Default: false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the performance statistics.
Default: 10000 (10 seconds)
- driver
- Collection that configures collection of metrics at the Spark Driver.
- connectorSource
- Enables collecting Spark Cassandra Connector metrics at the Spark Driver.
- true - Collect metrics.
- false - Do not collect metrics.
Default: false
- jvmSource
- Enables collection of JVM heap and garbage collection (GC) metrics from the Spark
Driver.
- true - Collect metrics.
- false - Do not collect metrics.
Default: false
- stateSource
- Enables collection of application state metrics at the Spark Driver.
- true - Collect metrics.
- false - Do not collect metrics.
Default: false
- executor
- Configures collection of metrics at Spark executors.
- sink
- Enables collecting metrics collected at Spark executors.
- true - Collect metrics.
- false - Do not collect metrics.
Default: false
- connectorSource
- Enables collection of Spark Cassandra Connector metrics at Spark executors.
- true - Collect metrics.
- false - Do not collect metrics.
Default: false
- jvmSource
- Enables collection of JVM heap and GC metrics at Spark executors.
- true - Collect metrics.
- false - Do not collect metrics.
Default: false
DSE Analytics
Spark resource options
spark_shared_secret_bit_length: 256 spark_security_enabled: false spark_security_encryption_enabled: false spark_daemon_readiness_assertion_interval: 1000 resource_manager_options: worker_options: cores_total: 0.7 memory_total: 0.6 workpools: - name: alwayson_sql cores: 0.25 memory: 0.25
- The length of a shared secret used to authenticate Spark components and encrypt the
connections between them. This value is not the strength of the cipher for
encrypting connections.
Default: 256
- spark_security_enabled
- In DSE 6.7.4 and later, when DSE authentication is enabled with authentication_options, Spark security is enabled
regardless of this setting.
In DSE 6.7.0-6.7.3, enables and disables Spark security based on shared secret infrastructure for mutual authentication and optional encryption between DSE Spark Master and Workers, and communication channels, except the web UI.
Default: false
- spark_security_encryption_enabled
- In DSE 6.7.4 and later, when DSE authentication is enabled with authentication_options, Spark security encryption is
enabled regardless of this setting.
In DSE 6.7.0-6.7.3, enables and disables encryption between DSE Spark Master and Workers, and communication channels, except the web UI. Uses DIGEST-MD5 SASL-based encryption mechanism. Requires
spark_security_enabled: true
.Tip: Configure encryption between the Spark processes and DSE with client-to-node encryption in cassandra.yaml.Default: false
- spark_daemon_readiness_assertion_interval
- Time interval in milliseconds between subsequent retries by the Spark plugin for
Spark Master and Worker readiness to start.
Default: 1000
- resource_manager_options
- Controls the physical resources used by Spark applications on this node. Optionally add named workpools with specific dedicated resources. See Core management.
- worker_options
- Configures the amount of system resources that are made available to the Spark Worker.
- cores_total
- The number of total system cores available to Spark. Note: For DSE 6.7.7 and later, the
SPARK_WORKER_TOTAL_CORES
environment variables takes precedence over this setting.The lowest value that you can assign to Spark Worker cores is 1 core. If the results are lower, no exception is thrown and the values are automatically limited.
Note: Settingcores_total
or a workpool'scores
to 1.0 is a decimal value, meaning 100% of the available cores will be reserved. Settingcores_total
orcores
to 1 (no decimal point) is an explicit value, and one core will be reserved. - memory_total
- The amount of total system memory available to Spark.
- absolute value - Use standard suffixes like M for megabyte and G for gigabyte. For example, 12G.
- decimal value - Maximum fraction of system memory to give all executors for
all applications running on a particular node. For example, 0.8.When the value is expressed as a decimal, the available resources are calculated in the following way:
The lowest values that you can assign to Spark Worker memory is 64 MB. If the results are lower, no exception is thrown and the values are automatically limited.Spark Worker memory = memory_total x (total system memory - memory assigned to DataStax Enterprise)
Note: For DSE 6.7.7 and later, theSPARK_WORKER_TOTAL_MEMORY
environment variables takes precedence over this setting.Default: 0.6
- workpools
- A collection of named workpools that can use a portion of the total resources
defined under
worker_options
.A default workpool named
The total amount of resources defined in thedefault
is used if no workpools are defined in this section. If workpools are defined, the resources allocated to the workpools are taken from the total amount, with the remaining resources available to thedefault
workpool.workpools
section must not exceed the resources available to Spark inworker_options
. - name
- The name of the workpool. A workpool named
alwayson_sql
is created by default for AlwaysOn SQL. By default, thealwayson_sql
workpool is configured to use 25% of the resources available to Spark. - cores
- The number of system cores to use in this workpool expressed as an absolute value or
a decimal value. This option follows the same rules as
cores_total
. - memory
- The amount of memory to use in this workpool expressed as either an absolute value
or a decimal value. This option follows the same rules as
memory_total
.
Spark encryption options
spark_ui_options: encryption: inherit encryption_options: enabled: false keystore: resources/dse/conf/.ui-keystore keystore_password: cassandra require_client_auth: false truststore: .truststore truststore_password: cassandra # Advanced settings # protocol: TLS # algorithm: SunX509 # keystore_type: JKS # truststore_type: JKS # cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
- spark_ui_options
- Configures encryption for Spark Master and Spark Worker UIs. These options apply
only to Spark daemon UIs, and do not apply to user applications even when the user
applications are run in cluster mode. Tip: To set permissions on roles to allow Spark applications to be started, stopped, managed, and viewed, see Using authorization with Spark
- encryption
- The source for SSL settings.
- inherit - Inherit the SSL settings from the client_encryption_options in cassandra.yaml.
- custom - Use the following encryption_options in dse.yaml.
- encryption_options
- When
encryption: custom
, configures encryption for HTTPS of Spark Master and Worker UI. - enabled
- Enables Spark encryption for Spark client-to-Spark cluster and Spark
internode communication.
Default: false
- keystore
- The keystore for Spark encryption keys.
The relative filepath is the base Spark configuration directory that is defined by the
SPARK_CONF_DIR
environment variable. The default Spark configuration directory is resources/spark/conf.Default: resources/dse/conf/.ui-keystore
- keystore_password
- The password to access the keystore.
Default: cassandra
- require_client_auth
- Enables custom truststore for client authentication.
- true - Require custom truststore for client authentication.
- false - Do not require custom truststore.
Default: false
- truststore
- The filepath to the truststore for Spark encryption keys if
require_client_auth: true
.The relative filepath is the base Spark configuration directory that is defined by the
Default: resources/dse/conf/.ui-truststoreSPARK_CONF_DIR
environment variable. The default Spark configuration directory is resources/spark/conf. - truststore_password
- The password to access the truststore.
Default: cassandra
- protocol
- The Transport Layer Security (TLS) authentication protocol. The TLS protocol
must be supported by JVM and Spark. TLS 1.2 is the most common JVM default.
Default: JVM default
- algorithm
- The key manager algorithm.
Default: SunX509
- keystore_type
- Valid types are JKS, JCEKS, PKCS11, and PKCS12. For
file-based keystores, use PKCS12.Attention: DataStax supports PKCS11 as a
keystore_type
on nodes withcassandra
oradvanced
workloads. Thecassandra
workload support is specific to DSE 6.7.7 and later releases. Theadvanced
workload support is specific to DSE 6.7.9 and later. If PKCS11 is needed, inserver_encryption_options
orclient_encryption_options
, specify thekeystore_type
asPKCS11
and thekeystore
asNONE
. PKCS11 is not supported in DSE 6.0.x and 5.1.x releases. PKCS11 is not supported as atruststore_type.
Default: JKS
- truststore_type
- Valid types are JKS, JCEKS, and PKCS12.
Default: commented out (
JKS
) - cipher_suites
- A comma-separated list of cipher suites for Spark encryption. Enclose the list in
square brackets.
- TLS_RSA_WITH_AES_128_CBC_SHA
- TLS_RSA_WITH_AES_256_CBC_SHA
- TLS_DHE_RSA_WITH_AES_128_CBC_SHA
- TLS_DHE_RSA_WITH_AES_256_CBC_SHA
- TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
- TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
Starting Spark drivers and executors
spark_process_runner: runner_type: default run_as_runner_options: user_slots: - slot1 - slot2
- spark_process_runner:
- Configures how Spark driver and executor processes are created and managed. See Running Spark processes as separate users.
- runner_type
-
- default - Use the default runner type.
- run_as - Spark applications run as a different OS user than the DSE service user.
- run_as_runner_options
- When
runner_type: run_as
, Spark applications run as a different OS user than the DSE service user. - user_slots
- The list slot users to separate Spark processes users from the DSE service user.
Default: slot1, slot2
AlwaysOn SQL
Properties to enable and configure AlwaysOn SQL on analytics nodes.
# AlwaysOn SQL options # alwayson_sql_options: # enabled: false # thrift_port: 10000 # web_ui_port: 9077 # reserve_port_wait_time_ms: 100 # alwayson_sql_status_check_wait_time_ms: 500 # workpool: alwayson_sql # log_dsefs_dir: /spark/log/alwayson_sql # auth_user: alwayson_sql # runner_max_errors: 10 # heartbeat_update_interval_seconds: 30
- alwayson_sql_options
- Configures the AlwaysOn SQL server.
- enabled
- Enables AlwaysOn SQL for this node.
- true - Enable AlwaysOn SQL for this node. The node must be an analytics node. Set workpools in Spark resource_manager_options.
- false - Do not enable AlwaysOn SQL for this node.
Default: false
- thrift_port
- The Thrift port on which AlwaysOn SQL listens.
Default: 10000
- web_ui_port
- The port on which the AlwaysOn SQL web UI is available.
- reserve_port_wait_time_ms
- The wait time in milliseconds to reserve the
thrift_port
if it is not available.Default: 100
- alwayson_sql_status_check_wait_time_ms
- The time in milliseconds to wait for a health check status of the AlwaysOn SQL
server.
Default: 500
- workpool
- The named workpool used by
AlwaysOn SQL.
Default: alwayson_sql
- log_dsefs_dir
- Location in DSEFS of the AlwaysOn SQL log files.
Default: /spark/log/alwayson_sql
- auth_user
- The role to use for internal communication by AlwaysOn SQL if authentication is
enabled. Custom roles must be created with
login=true
.Default: alwayson_sql
- runner_max_errors
- The maximum number of errors that can occur during AlwaysOn SQL service runner
thread runs before stopping the service. A service stop requires a manual
restart.
Default: 10
- heartbeat_update_interval_seconds
- The time interval to update heartbeat of AlwaysOn SQL. If heartbeat is not updated for more than three times the interval, AlwaysOn SQL automatically restarts.
DSE File System (DSEFS)
# dsefs_options:
# enabled:
# keyspace_name: dsefs
# work_dir: /var/lib/dsefs
# public_port: 5598
# private_port: 5599
# data_directories:
# - dir: /var/lib/dsefs/data
# storage_weight: 1.0
# min_free_space: 5368709120
- dsefs_options
- Configures DSEFS. See Configuring DSEFS.
- enabled
- Enables DSEFS.
- true - Enables DSEFS on this node, regardless of the workload.
- false - Disables DSEFS on this node, regardless of the workload.
- blank or commented out (#) - DSEFS starts only if the node is configured to run analytics workloads.
Default:
- keyspace_name
- The keyspace where the DSEFS metadata is stored. You can optionally configure
multiple DSEFS file systems within a single datacenter by specifying different
keyspace names for each cluster.
Default: dsefs
- work_dir
- The local directory for storing the local node metadata, including the node identifier. The volume of data stored in this directory is nominal and does not require configuration for throughput, latency, or capacity. This directory must not be shared by DSEFS nodes.
- public_port
- The public port on which DSEFS listens for clients. Note: DataStax recommends that all nodes in the cluster have the same value. Firewalls must open this port to trusted clients. The service on this port is bound to the native_transport_address.
Default: 5598
- private_port
- The private port for DSEFS internode communication. CAUTION: Do not open this port to firewalls; this private port must be not visible from outside of the cluster.
Default: 5599
- data_directories
- One or more data locations where the DSEFS data is stored.
- - dir
- Mandatory attribute to identify the set of directories. DataStax recommends
segregating these data directories on physical devices that are different from the
devices that are used for DataStax Enterprise. Using multiple directories on JBOD
improves performance and capacity.
Default: /var/lib/dsefs/data
- storage_weight
- Weighting factor for this location. Determines how much data to place in this
directory, relative to other directories in the cluster. This soft constraint
determines how DSEFS distributes the data. For example, a directory with a value of
3.0 receives about three times more data than a directory with a value of 1.0.
Default: 1.0
- min_free_space
- The reserved space, in bytes, to not use for storing file data blocks. You can use
a unit of measure suffix to specify other size units. For example: terabyte (1 TB),
gigabyte (10 GB), and megabyte (5000 MB).
Default: 5368709120
# service_startup_timeout_ms: 60000
# service_close_timeout_ms: 600000
# server_close_timeout_ms: 2147483647 # Integer.MAX_VALUE
# compression_frame_max_size: 1048576
# query_cache_size: 2048
# query_cache_expire_after_ms: 2000
# gossip_options:
# round_delay_ms: 2000
# startup_delay_ms: 5000
# shutdown_delay_ms: 10000
# rest_options:
# request_timeout_ms: 330000
# connection_open_timeout_ms: 55000
# client_close_timeout_ms: 60000
# server_request_timeout_ms: 300000
# idle_connection_timeout_ms: 60000
# internode_idle_connection_timeout_ms: 120000
# core_max_concurrent_connections_per_host: 8
# transaction_options:
# transaction_timeout_ms: 3000
# conflict_retry_delay_ms: 200
# conflict_retry_count: 40
# execution_retry_delay_ms: 1000
# execution_retry_count: 3
# block_allocator_options:
# overflow_margin_mb: 1024
# overflow_factor: 1.05
- service_startup_timeout_ms
- Wait time in milliseconds before the DSEFS server times out while waiting for
services to bootstrap.
Default: 60000
- service_close_timeout_ms
- Wait time in milliseconds before the DSEFS server times out while waiting for
services to close.
Default: 60000
- server_close_timeout_ms
- Wait time in milliseconds that the DSEFS server waits during shutdown before
closing all pending connections.
Default: 2147483647
- compression_frame_max_size
- The maximum accepted size of a compression frame defined during file upload.
Default: 1048576
- query_cache_size
- Maximum number of elements in a single DSEFS Server query cache.
Default: 2048
- query_cache_expire_after_ms
- The time to retain the DSEFS Server query cache element in cache. The cache
element expires when this time is exceeded.
Default: 2000
- gossip options
- Configures DSEFS gossip rounds.
- round_delay_ms
- The delay in milliseconds between gossip rounds.
Default: 2000
- startup_delay_ms
- The delay in milliseconds between registering the location and reading back all
other locations from the database.
Default: 5000
- shutdown_delay_ms
- The delay time in milliseconds between announcing shutdown and shutting down the
node.
Default: 30000
- rest_options
- Configures DSEFS rest times.
- request_timeout_ms
- The time in milliseconds that the client waits for a response that corresponds to
a given request.
Default: 330000
- connection_open_timeout_ms
- The time in milliseconds that the client waits to establish a new connection.
Default: 55000
- client_close_timeout_ms
- The time in milliseconds that the client waits for pending transfer to complete
before closing a connection.
Default: 60000
- server_request_timeout_ms
- The time in milliseconds to wait for the server rest call to complete.
Default: 300000
- idle_connection_timeout_ms
- The time in milliseconds for RestClient to wait before closing an idle connection.
If RestClient does not close connection after timeout, the connection is closed
after 2 x this wait time.
- time - Wait time to close idle connection.
- 0 - Disable closing idle connections.
Default: 60000
- internode_idle_connection_timeout_ms
- Wait time in milliseconds before closing idle internode connection. The internode
connections are primarily used to exchange data during replication. Do not set lower
than the default value for heavily utilized DSEFS
clusters.
Default: 0
- core_max_concurrent_connections_per_host
- Maximum number of connections to a given host per single CPU core. DSEFS keeps a
connection pool for each CPU core.
Default: 8
- transaction_options
- Configures DSEFS transaction times.
- transaction_timeout_ms
- Transaction run time in milliseconds before the transaction is considered for
timeout and rollback.
Default: 3000
- conflict_retry_delay_ms
- Wait time in milliseconds before retrying a transaction that was ended due to a
conflict.
Default: 200
- conflict_retry_count
- The number of times to retry a transaction before giving up.
Default: 40
- execution_retry_delay_ms
- Wait time in milliseconds before retrying a failed transaction payload execution.
Default: 1000
- execution_retry_count
- The number of payload execution retries before signaling the error to the
application.
Default: 3
- block_allocator_options
- Controls how much additional data can be placed on the local coordinator before
the local node overflows to the other nodes. The trade-off is between data locality
of writes and balancing the cluster. A local node is preferred for a new block
allocation,
if:
used_size_on_the_local_node < average_used_size_per_node x overflow_factor + overflow_margin
- overflow_margin_mb
-
- margin_size - Overflow margin size in megabytes.
- 0 - Disable block allocation overflow
Default: 1024
- overflow_factor
-
- factor - Overflow factor on an exponential scale.
- 1.0 - Disable block allocation overflow
Default: 1.05
DSE Metrics Collector
# insights_options: # data_dir: /var/lib/cassandra/insights_data # log_dir: /var/log/cassandra/
Uncomment these options only to change the default directories.
- insights_options
- Options for DSE Metrics Collector.
- data_dir
- Directory to store collected metrics.
When
data_dir
is not explicitly set, theinsights_data
directory is stored in the same parent directory as the commitlog_directory as defined in cassandra.yaml. If the commitlog_directory uses the package default of /var/lib/cassandra/commitlog,data_dir
will default to /var/lib/cassandra/insights_data.Default: /var/lib/cassandra/insights_data
- log_dir
- Directory to store logs for collected metrics. The log file is
dse-collectd.log. The file with the collectd PID is
dse-collectd.pid.
Default: /var/log/cassandra/
Audit logging for database activities
audit_logging_options: enabled: false logger: SLF4JAuditWriter # included_categories: # excluded_categories: # # included_keyspaces: # excluded_keyspaces: # # included_roles: # excluded_roles:
- audit_logging_options
- Configures database activity logging.
- enabled
- Enables database activity auditing.
- true - Enable database activity auditing.
- false - Disable database activity auditing.
Default: false
- logger
- The logger to use for recording events:
- SLF4JAuditWriter - Capture events in a log file.
- CassandraAuditWriter - Capture events in the
dse_audit.audit_log
table.
Tip: Configure logging level, sensitive data masking, and log file name/location in the logback.xml file.Default:
SLF4JAuditWriter
- included_categories
- Comma-separated list of event categories that are captured.
- QUERY - Data retrieval events.
- DML - (Data manipulation language) Data change events.
- DDL - (Data definition language) Database schema change events.
- DCL - (Data change language) Role and permission management events.
- AUTH - (Authentication) Login and authorization related events.
- ERROR - Failed requests.
- UNKNOWN - Events where the category and type are both
UNKNOWN
.
Warning: Use eitherincluded_categories
orexcluded_categories
but not both. When specifying included categories leave excluded_categories blank or commented out.Default: none (include all categories)
- excluded_categories
- Comma-separated list of categories to ignore, where the categories are:
- QUERY - Data retrieval events.
- DML - (Data manipulation language) Data change events.
- DDL - (Data definition language) Database schema change events.
- DCL - (Data change language) Role and permission management events.
- AUTH - (Authentication) Login and authorization related events.
- ERROR - Failed requests.
- UNKNOWN - Events where the category and type are both
UNKNOWN
.
Warning: Use eitherincluded_categories
orexcluded_categories
but not both.Default: exclude no categories
- included_keyspaces
- Comma-separated list of keyspaces for which events are logged. You can also use a
regular expression to filter on keyspace name.Warning: DSE supports using either
included_keyspaces
orexcluded_keyspaces
but not both.Default: include all keyspaces
- excluded_keyspaces
- Comma-separated list of keyspaces to exclude. You can also use a regular
expression to filter on keyspace
name.
Default: exclude no keyspaces
- included_roles
- Comma-separated list of the roles for which events are logged. Warning: DSE supports using either
included_roles
orexcluded_roles
but not both.Default: include all roles
- excluded_roles
- The roles for which events are not logged. Specify a comma separated list role
names.
Default: exclude no roles
Cassandra audit writer options
retention_time: 0 cassandra_audit_writer_options: mode: sync batch_size: 50 flush_time: 250 queue_size: 30000 write_consistency: QUORUM # dropped_event_log: /var/log/cassandra/dropped_audit_events.log # day_partition_millis: 3600000
- retention_time
- The number of hours to retain audit events by supporting loggers for the
CassandraAuditWriter.
- hours - The number of hours to retain audit events.
- 0 - Retain events forever.
Default: 0
- cassandra_audit_writer_options
- Audit writer options.
- mode
- The mode the writer runs in.
- sync - A query is not executed until the audit event is successfully written.
- async - Audit events are queued for writing to the audit table, but are not
necessarily logged before the query executes. A pool of writer threads consumes
the audit events from the queue, and writes them to the audit table in batch
queries. Important: While async substantially improves performance under load, if there is a failure between when a query is executed, and its audit event is written to the table, the audit table might be missing entries for queries that were executed.
Default: sync
- batch_size
- Available only when
mode: async
. Must be greater than 0.The maximum number of events the writer dequeues before writing them out to the table. If warnings in the logs reveal that batches are too large, decrease this value or increase the value of batch_size_warn_threshold_in_kb in cassandra.yaml.
Default: 50
- flush_time
- Available only when
mode: async
.The maximum amount of time in milliseconds before an event is removed from the queue by a writer before being written out. This flush time prevents events from waiting too long before being written to the table when there are not a lot of queries happening.
Default: 500
- queue_size
- The size of the queue feeding the asynchronous audit log writer threads.
- Number of events - When there are more events being produced than the writers can write out, the queue fills up, and newer queries are blocked until there is space on the queue.
- 0 - The queue size is unbounded, which can lead to resource exhaustion under heavy query load.
Default: 30000
- write_consistency
- The consistency
level that is used to write audit events.
Default: QUORUM
- dropped_event_log
- The directory to store the log file that reports dropped events.
Default: /var/log/cassandra/dropped_audit_events.log
- day_partition_millis
- The time interval in milliseconds between changing nodes to spread audit log
information across multiple nodes. For example, to change the target node every 12
hours, specify 43200000 milliseconds.
Default: 3600000 (1 hour)
DSE Tiered Storage
# tiered_storage_options: # strategy1: # tiers: # - paths: # - /mnt1 # - /mnt2 # - paths: [ /mnt3, /mnt4 ] # - paths: [ /mnt5, /mnt6 ] # # local_options: # k1: v1 # k2: v2 # # 'another strategy': # tiers: [ paths: [ /mnt1 ] ]
- tiered_storage_options
- Configures the smart movement of data across different types of storage media so that data is matched to the most suitable drive type, according to the required performance and cost characteristics.
- strategy1
- The first disk configuration strategy. Create a strategy2, strategy3, and so on. In this example, strategy1 is the configurable name of the tiered storage configuration strategy.
- tiers
- The unnamed tiers in this section configure a storage tier with the paths and filepaths that define the priority order.
- local_options
- Local configuration options overwrite the tiered storage settings for the table schema in the local dse.yaml file. See Testing DSE Tiered Storage configurations.
- - paths
- The section of filepaths that define the data directories for this tier of the disk configuration. List the fastest storage media first. These paths are used to store only data that is configured to use tiered storage and are independent of any settings in the cassandra.yaml file.
- - /filepath
- The filepaths that define the data directories for this tier of the disk configuration.
DSE Advanced Replication
# advanced_replication_options: # enabled: false # conf_driver_password_encryption_enabled: false # advanced_replication_directory: /var/lib/cassandra/advrep # security_base_path: /base/path/to/advrep/security/files/
- advanced_replication_options
- Configure DSE Advanced Replication.
- enabled
- Enables an edge node to collect data in the replication log.
Default: false
- conf_driver_password_encryption_enabled
- Enables encryption of driver passwords. See .
Default: false
- advanced_replication_directory
- The directory for storing advanced replication CDC logs. The
replication_logs directory will be created in the specified
directory.
Default: /var/lib/cassandra/advrep
- security_base_path
- The base path to prepend to paths in the Advanced Replication configuration locations,
including locations to SSL keystore, SSL truststore, and so on.
Default: /base/path/to/advrep/security/files/
Internode messaging
internode_messaging_options: port: 8609 # frame_length_in_mb: 256 # server_acceptor_threads: 8 # server_worker_threads: 16 # client_max_connections: 100 # client_worker_threads: 16 # handshake_timeout_seconds: 10 # client_request_timeout_seconds: 60
- internode_messaging_options
- Configures the internal messaging service used by several components of DataStax Enterprise. All internode messaging requests use this service.
- port
- The mandatory port for the internode messaging service.
Default: 8609
- frame_length_in_mb
- Maximum message frame length.
Default: 256
- server_acceptor_threads
- The number of server acceptor threads.
Default: The number of available processors
- server_worker_threads
- The number of server worker threads.
Default: The default is the number of available processors x 8
- client_max_connections
- The maximum number of client connections.
Default: 100
- client_worker_threads
- The number of client worker threads.
Default: The default is the number of available processors x 8
- handshake_timeout_seconds
- Timeout for communication handshake process.
Default: 10
- client_request_timeout_seconds
- Timeout for non-query search requests like core creation and distributed deletes.
Default: 60
DSE Multi-Instance
- server_id
- Unique generated ID of the physical server in DSE
Multi-Instance
/etc/dse-nodeId/dse.yaml files. You can
change server_id when the MAC address is not unique, such as a
virtualized server where the host’s physical MAC is
cloned.
Default: the media access control address (MAC address) of the physical server
DSE Graph
- configDseYaml.html#configDseYaml__graphDSE Graph system-level
- configDseYaml.html#configDseYaml__gremlin_serverDSE Graph Gremlin Server options
DSE Graph Gremlin Server
# gremlin_server: # port: 8182 # threadPoolWorker: 2 # gremlinPool: 0 # scriptEngines: # gremlin-groovy: # config: # sandbox_enabled: false # sandbox_rules: # whitelist_packages: # - package.name # whitelist_types: # - fully.qualified.type.name # whitelist_supers: # - fully.qualified.class.name # blacklist_packages: # - package.name # blacklist_supers: # - fully.qualified.class.name
- gremlin_server
- The top-level configurations in Gremlin Server.
- port
- The available communications port for Gremlin Server.
Default: 8182
- threadPoolWorker
- The number of worker threads that handle non-blocking read and write (requests and
responses) on the Gremlin Server channel, including routing requests to the right
server operations, handling scheduled jobs on the server, and writing serialized
responses back to the client.
Default: 2
- gremlinPool
- This pool represents the workers available to handle blocking operations in
Gremlin Server.
- 0 - the value of the JVM property cassandra.available_processors, if that property is set
- positive number - The number of Gremlin threads available to execute actual scripts in a ScriptEngine.
Default: the value of Runtime.getRuntime().availableProcessors()
- scriptEngines
- Configures gremlin server scripts.
- gremlin-groovy
- Configures for gremlin-groovy scripts.
- sandbox_enabled
- Configures gremlim groovy sandbox.
- true - Enable the gremlim groovy sandbox.
- false - Disable the gremlin groovy sandbox entirely.
Default: true
- sandbox_rules
- Configures sandbox rules.
- whitelist_packages
- List of packages, one package per line, to whitelist.
- -package.name
- The fully qualified package name.
- whitelist_types
- List of types, one type per line, to whitelist.
- -fully.qualified.type.name
- The fully qualified type name.
- whitelist_supers
- List of super classes, one class per line, to whitelist.
- -fully.qualified.class.name
- The fully qualified class name.
- blacklist_packages
- List of packages, one package per line, to blacklist.
- -package.name
- The fully qualified package name.
- blacklist_supers
- List of super classes, one class per line, to blacklist. Retain the hyphen before the fully qualified class name.
- -fully.qualified.class.name
- The fully qualified class name.
DSE Graph system-level
# graph: # analytic_evaluation_timeout_in_minutes: 10080 # realtime_evaluation_timeout_in_seconds: 30 # schema_agreement_timeout_in_ms: 10000 # system_evaluation_timeout_in_seconds: 180 # index_cache_size_in_mb: 128 # max_query_queue: 10000 # max_query_threads (no explicit default) # max_query_params: 16
- graph
- System-level configuration options and options that are shared between graph
instances. Add an option if it is not present in the provided
dse.yaml file.
Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.
- analytic_evaluation_timeout_in_minutes
- Maximum time to wait for an OLAP analytic (Spark) traversal to
evaluate.
Default: 10080 (168 hours)
- realtime_evaluation_timeout_in_seconds
- Maximum time to wait for an OLTP real-time traversal to evaluate.
Default: 30
- schema_agreement_timeout_in_ms
- Maximum time to wait for the database to agree on schema versions before timing
out.
Default: 10000
- system_evaluation_timeout_in_seconds
- Maximum time to wait for a graph system-based request to execute, like creating a
new graph.
Default: 180 (3 minutes)
- index_cache_size_in_mb
- The amount of ram to allocate to the index cache.
Default: 128
- max_query_queue
- The maximum number of CQL queries that can be queued as a result of Gremlin
requests. Incoming queries are rejected if the queue size exceeds this setting.
Default: 10000
- max_query_threads
- The maximum number of threads to use for queries to the database. The default is calculated:
- If gremlinPool is present and nonzero:
10 x the gremlinPool setting
- If gremlinPool is not present in this file or set to zero:
The number of available CPU cores
Default: calculated
- If gremlinPool is present and nonzero:
- max_query_params
- The maximum number of parameters that can be passed on a graph query request for
TinkerPop drivers and drivers using the Cassandra native protocol. Passing very
large numbers of parameters on requests is an anti-pattern, because the script
evaluation time increases proportionally. DataStax recommends reducing the number of
parameters to speed up script compilation times. Before you increase this value,
consider alternate methods for parameterizing scripts, like passing a single map. If
the graph query request requires many arguments, pass a list.
Default: 16