dse.yaml configuration file

The dse.yaml file is the primary configuration file for security, advanced functionality, and DSE Search, DataStax Graph, and DSE Analytics workloads.

After changing properties in the dse.yaml file, you must restart the node for the changes to take effect.

The cassandra.yaml file is the primary configuration file for the DataStax Enterprise database.

Syntax

For the properties in each section, the parent setting has zero spaces. Each child entry requires at least two spaces. Adhere to the YAML syntax and retain the spacing. For example:

node_health_options:
    refresh_rate_ms: 60000
    uptime_ramp_up_period_seconds: 10800
    dropped_mutation_window_minutes: 30

Security and authentication

Authentication options

DSE Authenticator supports multiple schemes for authentication at the same time in a DataStax Enterprise cluster. Additional authenticator configuration is required in cassandra.yaml.

See role_management_options to use internal and LDAP schemes for role management.

# authentication_options:
#    enabled: false
#    default_scheme: internal
#    other_schemes:
#    scheme_permissions: false
#    allow_digest_with_kerberos: true
#    plain_text_without_ssl: warn
#    transitional_mode: disabled
authentication_options

Configures DseAuthenticator to authenticate users when the authenticator option in cassandra.yaml is set to com.datastax.bdp.cassandra.auth.DseAuthenticator. Authenticators other than DseAuthenticator are not supported.

enabled

Enables user authentication.

  • true - The DseAuthenticator authenticates users.

  • false - The DseAuthenticator does not authenticate users and allows all connections.

Default: false

default_scheme

The first scheme to validate a user against when the driver does not request a specific scheme.

  • internal - Plain text authentication using the internal password authentication.

  • ldap - Plain text authentication using pass-through LDAP authentication.

  • kerberos - GSSAPI authentication using the Kerberos authenticator.

Default: internal

other_schemes

List of schemes that are checked if validation against the first scheme fails and no scheme was specified by the driver.

  • ldap - Plain text authentication using pass-through LDAP authentication.

  • kerberos - GSSAPI authentication using the Kerberos authenticator.

Default: none

scheme_permissions

Determines if roles need to have permission granted to them to use specific authentication schemes. These permissions can be granted only when the DseAuthorizer is used.

  • true - Use multiple schemes for authentication. To be assigned, every role requires permissions to a scheme.

  • false - Do not use multiple schemes for authentication. Prevents unintentional role assignment that might occur if user or group names overlap in the authentication service.

Default: false

allow_digest_with_kerberos

Controls whether DIGEST-MD5 authentication is allowed with Kerberos. Kerberos uses DIGEST-MD5 to pass credentials between nodes and jobs. The DIGEST-MD5 mechanism is not associated directly with an authentication scheme.

  • true - Allow DIGEST-MD5 authentication with Kerberos. In analytics clusters, set to true to use Hadoop internode authentication with Hadoop and Spark jobs.

  • false - Do not allow DIGEST-MD5 authentication with Kerberos.

Default: true

plain_text_without_ssl

Controls how the DseAuthenticator responds to plain text authentication requests over unencrypted client connections.

  • block - Block the request with an authentication error.

  • warn - Log a warning but allow the request.

  • allow - Allow the request without any warning.

Default: warn

transitional_mode

Sets transitional mode for temporary use during authentication setup in an established environment.

Transitional mode allows access to the database using the anonymous role, which has all permissions except AUTHORIZE.

  • disabled - Disable transitional mode. All connections must provide valid credentials and map to a login-enabled role.

  • permissive - Only super users are authenticated and logged in. All other authentication attempts are logged in as the anonymous user.

  • normal - Allow all connections that provide credentials. Maps all authenticated users to their role, and maps all other connections to anonymous.

  • strict - Allow only authenticated connections that map to a login-enabled role OR connections that provide a blank username and password as anonymous.

Credentials are required for all connections after authentication is enabled; use a blank username and password to login with anonymous role in transitional mode.

Default: disabled

Role management options

#role_management_options:
#    mode: internal
#    mode_by_authentication:
#        internal:
#        ldap:
#        kerberos:
#    stats: false
role_management_options

Configures the DSE Role Manager. To enable role manager, set:

When scheme_permissions is enabled, all roles must have permission to execute on the authentication scheme. See Binding a role to an authentication scheme.

mode

Manages granting and revoking of roles.

  • internal - Manage granting and revoking of roles internally using the GRANT ROLE and REVOKE ROLE CQL statements. See Managing database access. Internal role management allows nesting roles for permission management.

  • ldap - Manage granting and revoking of roles using an external LDAP server configured using the ldap_options. To configure an LDAP scheme, complete the steps in Defining an LDAP scheme. Nesting roles for permission management is disabled.

Default: internal

mode_by_authentication

role_by_authentication is supported beginning in DSE 6.8.9.

Controls the granting and revoking of roles based on the users' authentication method. When this setting is used, it overrides the value from the mode setting. The parameters are:

  • internal - Specifies the method of role management when a user has authenticated using the internal authentication scheme.

  • ldap - Specifies the method of role management when a user has authenticated using an LDAP server.

  • kerberos - Specifies the method of role management when a user has authenticated using a Kerberos authentication scheme. For all parameters, the possible values are:

  • internal - grant and revoke roles internally using the GRANT ROLE and REVOKE ROLE CQL statements.

  • ldap - grant and revoke roles using an external LDAP server configured using the ldap_options.

Default: uses the value from the role_management_options:mode setting

stats

Set to true, to enable logging of DSE role creation and modification events in the dse_security.role_stats system table. All nodes must have the stats option enabled, and must be restarted for the functionality to take effect.

To query role events:

SELECT * FROM dse_security.role_stats;

 role  | created                         | password_changed
-------+---------------------------------+---------------------------------
 user1 | 2020-04-13 00:44:09.221000+0000 |                            null
 user2 | 2020-04-12 23:49:21.457000+0000 | 2020-04-12 23:49:21.457000+0000

(2 rows)

Default: commented out (false)

Authorization options

#authorization_options:
#    enabled: false
#    transitional_mode: disabled
#    allow_row_level_security: false
authorization_options

Configures the DSE Authorizer to authorize users when the authorization option in cassandra.yaml is set to com.datastax.bdp.cassandra.auth.DseAuthorizer.

enabled

Enables the DSE Authorizer for role-based access control (RBAC).

  • true - Enable the DSE Authorizer for RBAC.

  • false - Do not use the DSE Authorizer.

Default: false

transitional_mode

Allows the DSE Authorizer to operate in a temporary mode during authorization setup in a cluster.

  • disabled - Transitional mode is disabled.

  • normal - Permissions can be passed to resources, but are not enforced.

  • strict - Permissions can be passed to resources, and are enforced on authenticated users. Permissions are not enforced against anonymous users.

Default: disabled

allow_row_level_security

Enables row-level access control (RLAC) permissions. Use the same setting on all nodes. See Setting up Row Level Access Control (RLAC).

  • true - Use row-level security.

  • false - Do not use row-level security.

Default: false

Kerberos options

kerberos_options:
    keytab: resources/dse/conf/dse.keytab
    service_principal: dse/_HOST@REALM
    http_principal: HTTP/_HOST@REALM
    qop: auth
kerberos_options

Configures security for a DataStax Enterprise cluster using Kerberos.

keytab

The filepath of dse.keytab.

Default: resources/dse/conf/dse.keytab

service_principal

The service_principal that the DataStax Enterprise process runs under must use the form <dse_user>/_HOST@<REALM>, where:

  • <dse_user> is the username of the user that starts the DataStax Enterprise process.

  • _HOST is converted to a reverse DNS lookup of the broadcast address.

  • <REALM> is the name of your Kerberos realm. In the Kerberos principal, <REALM> must be uppercase.

Default: dse/_HOST@REALM

http_principal

Used by the Tomcat application container to run DSE Search. The Tomcat web server uses the GSSAPI mechanism (SPNEGO) to negotiate the GSSAPI security mechanism (Kerberos). <REALM> is the name of your Kerberos realm. In the Kerberos principal, <REALM> must be uppercase.

Default: HTTP/_HOST@REALM

qop

A comma-delimited list of Quality of Protection (QOP) values that clients and servers can use for each connection. The client can have multiple QOP values, while the server can have only a single QOP value.

  • auth - Authentication only.

  • auth-int - Authentication plus integrity protection for all transmitted data.

  • auth-conf - Authentication plus integrity protection and encryption of all transmitted data.

    Encryption using auth-conf is separate and independent of whether encryption is done using SSL. If both auth-conf and SSL are enabled, the transmitted data is encrypted twice. DataStax recommends choosing only one method and using it for encryption and authentication.

Default: auth

LDAP options

Define LDAP options to authenticate users against an external LDAP service and/or for Role Management using LDAP group lookup. For details, read this section. Related: also refer to Defining an LDAP scheme.

Starting in DSE 6.8.2, LDAP servers can handle multiple, comma separated addresses, with or without a port. If the port is not provided, the ldap_options.server_port parameter is used by default. This way, there is no change in configuration for existing users who have LDAP configured.

  • A connection pool is created of each server separately. Once the connection is attempted, the best pool is chosen using a heuristic. DSE uses a circuit breaker to temporarily disable those servers that frequently fail to connect. Also, DSE tries to choose the pool that has the greatest number of idle connections.

  • Failover parameters are configured through system properties.

  • A new method was added in DSE 6.8.2 to the LDAP MBean to reset LDAP connectors - that is, close all connection pools and recreate them.

# ldap_options:
#     server_host:
#     server_port: 389
#     hostname_verification: false
#     search_dn:
#     search_password:
#     use_ssl: false
#     use_tls: false
#     truststore_path:
#     truststore_password:
#     truststore_type: jks
#     user_search_base:
#     user_search_filter: (uid={0})
#     user_memberof_attribute: memberof
#     extra_user_search_bases:
#     group_search_type: directory_search
#     group_search_base:
#     group_search_filter: (uniquemember={0})
#     group_name_attribute: cn
#     extra_group_search_bases:
#     credentials_validity_in_ms: 0
#     search_validity_in_seconds: 0
#     connection_pool:
#         max_active: 8
#         max_idle: 8

Microsoft Active Directory (AD) example, for both authentication and role management:

ldap_options:
    server_host: win2012ad_server.mycompany.lan
    server_port: 389
    search_dn: cn=lookup_user,cn=users,dc=win2012domain,dc=mycompany,dc=lan
    search_password: lookup_user_password
    use_ssl: false
    use_tls: false
    truststore_path:
    truststore_password:
    truststore_type: jks
    #group_search_type: directory_search
    group_search_type: memberof_search
    #group_search_base:
    #group_search_filter:
    group_name_attribute: cn
    user_search_base: cn=users,dc=win2012domain,dc=mycompany,dc=lan
    user_search_filter: (sAMAccountName={0})
    user_memberof_attribute: memberOf
    connection_pool:
        max_active: 8
        max_idle: 8
ldap_options

Configures LDAP security when the authenticator option in cassandra.yaml is set to com.datastax.bdp.cassandra.auth.DseAuthenticator.

server_host

A comma separated list of LDAP server hosts.

Do not use LDAP on the same host (localhost) in production environments. Using LDAP on the same host (localhost) is appropriate only in single node test or development environments.

For information on parameters related to tunning failover performance for multiple LDAP servers, see Tune LDAP failover.

Default: none

server_port

The port on which the LDAP server listens.

  • 389 - The default port for unencrypted connections.

  • 636 - Used for encrypted connections. Default SSL or TLS port for LDAP.

Default: 389

hostname_verification

Enable hostname verification. The following conditions must be met:

  • Either use_ssl or use_tls must be set to true.

  • A valid truststore with the correct path specified in truststore_path must exist. The truststore must have a certificate entry, trustedCertEntry, including a SAN DNSName entry that matches the hostname of the LDAP server.

Default: false

search_dn

Distinguished name (DN) of an account with read access to the user_search_base and group_search_base. For example:

Do not create/use an LDAP account or group called cassandra. The DSE database comes with a default cassandra login role that has access to all database objects and uses the consistency level QUOROM.

When not set, the LDAP server uses an anonymous bind for search.

Default: commented out

search_password

The password of the search_dn account.

Default: commented out

use_ssl

Enables an SSL-encrypted connection to the LDAP server.

  • true - Use an SSL-encrypted connection.

  • false - Do not enable SSL connections to the LDAP server.

Default: false

use_tls

Enables STARTTLS connections to the LDAP server on the unencrypted port.

  • true - Enable STARTTLS connections to the LDAP server

  • false - Do not enable STARTTLS connections to the LDAP server

    Default: false

    Never enable use_ssl and use_tls at the same time.

    The LDAP server must support STARTTLS.

truststore_path

The filepath to the SSL certificates truststore.

Default: commented out

truststore_password

The password to access the truststore.

Default: commented out

truststore_type

Valid types are JKS, JCEKS, or PKCS12.

Default: jks

user_search_base

Distinguished name (DN) of the object to start the recursive search for user entries for authentication and role management memberof searches.

  • For your LDAP domain, set the ou and dc elements. Typically set to ou=users,dc=<domain>,dc=<top_level_domain>. For example, ou=users,dc=example,dc=com.

  • For your Active Directory, set the dc element for a different search base. Typically set to CN=search,CN=Users,DC=<ActDir_domname>,DC=internal. For example, CN=search,CN=Users,DC=example-sales,DC=internal.

Default: none

user_search_filter

Identifies the user that the search filter uses for looking up usernames.

  • uid={0} - When using LDAP.

  • samAccountName={0} - When using AD (Microsoft Active Directory). For example, (sAMAccountName={0}).

Default: uid={0}

user_memberof_attribute

Contains a list of group names. Role manager assigns DSE roles that exactly match any group name in the list. Required when managing roles using group_search_type: memberof_search with LDAP (role_manager.mode:ldap). The directory server must have memberof support, which is a default user attribute in Microsoft Active Directory (AD).

Default: memberof

extra_user_search_bases

Option to define additional search bases for users. If the user is not found in one search base, DSE attempts to find the user in another search base, until all search bases have been tried. See also user_search_base, group_search_base, and extra_group_search_bases.

Default: [] (empty list)

group_search_type

Defines how group membership is determined for a user. Required when managing roles with LDAP (role_manager.mode: ldap).

  • directory_search - Filters the results with a subtree search of group_search_base to find groups that contain the username in the attribute defined in the group_search_filter.

  • memberof_search - Recursively searches for user entries using the user_search_base and user_search_filter. Gets groups from the user attribute defined in user_memberof_attribute. The directory server must have memberof support.

Default: directory_search

group_search_base

The unique distinguished name (DN) of the group record from which to start the group membership search.

Default: commented out

group_search_filter

Set to any valid LDAP filter.

Default: uniquemember={0}

group_name_attribute

The attribute in the group record that contains the LDAP group name. Role names are case-sensitive and must match exactly on DSE for assignment. Unmatched groups are ignored.

Default: cn

extra_group_search_bases

Option to define additional search bases for groups. DSE merges all groups found in all the defined search bases. See also group_search_base, user_search_base, and extra_user_search_bases.

Default: [] (empty list)

credentials_validity_in_ms

A credentials cache improves performance by reducing the number of requests that are sent to the internal or LDAP server. See Defining an LDAP scheme.

  • 0 - Disable credentials cache.

  • duration period - The duration period in milliseconds of the credentials cache.

Starting in DSE 6.8.2, the upper limit for ldap_options.credentials_validity_in_ms increased to 864,000,000 ms, which is 10 days.

Default: 0

search_validity_in_seconds

Configures a search cache to improve performance by reducing the number of requests that are sent to the internal or LDAP server.

  • 0 - Disables search credentials cache.

  • positive number - The duration period in seconds for the search cache.

Starting in DSE 6.8.2, the upper limit for ldap_options.search_validity_in_seconds increased to 864,000 seconds, which is 10 days.

Default: 0

connection_pool

Configures the connection pool for making LDAP requests.

max_active

The maximum number of active connections to the LDAP server.

Default: 8

max_idle

The maximum number of idle connections in the pool awaiting requests.

Default: 8

Encrypt sensitive system resources

Options to encrypt sensitive system resources using a local encryption key or a remote KMIP key.

system_info_encryption:
  enabled: false
  cipher_algorithm: AES
  secret_key_strength: 128
  chunk_length_kb: 64
  key_provider: KmipKeyProviderFactory
  kmip_host: kmip_host_name

DataStax recommends using a remote encryption key from a KMIP provider when using Transparent data encryption. Use a local encryption key only if a KMIP server is not available.

system_info_encryption

Sets the encryption settings for system resources that might contain sensitive information, including the system.batchlog and system.paxos tables, hint files, and the database commit log.

enabled

Enables encryption of system resources. See Encrypting system resources.

  • true - Enable encryption of system resources.

  • false - Does not encryption of system resources.

The system_trace keyspace is not encrypted by enabling the system_information_encryption section. In environments that also have tracing enabled, manually configure encryption with compression on the system_trace keyspace. See Transparent data encryption.

Default: false

cipher_algorithm

The name of the JCE cipher algorithm used to encrypt system resources.

Supported cipher algorithms names
cipher_algorithm secret_key_strength

AES

128, 192, or 256

DES

56

DESede

112 or 168

Blowfish

32-448

RC2

40-128

Default: AES

secret_key_strength

Length of key to use for the system resources. See Supported cipher algorithms names.

DSE uses a matching local key or requests the key type from the KMIP server. For KMIP, if an existing key does not match, the KMIP server automatically generates a new key.

Default: 128

chunk_length_kb

Optional. Size of SSTable chunks when data from the system.batchlog or system.paxos are written to disk.

include:securing:partial$sec/SecWH-upgradeSystem.adoc[]

Default: 64

key_provider

KMIP key provider to enable encrypting sensitive system data with a KMIP key. Comment out if using a local encryption key.

Default: KmipKeyProviderFactory

kmip_host

The KMIP key server host. Set to the <kmip_group_name> that defines the KMIP host in kmip_hosts section. DSE requests a key from the KMIP host and uses the key generated by the KMIP provider.

Default: kmip_host_name

Encrypted configuration properties

Settings for using encrypted passwords in sensitive configuration file properties.

system_key_directory: /etc/dse/conf
config_encryption_active: false
config_encryption_key_name: (<key_filename> | <KMIP_key_URL> )
system_key_directory

Path to the directory where local encryption key files are stored, also called system keys. Distributes the system keys to all nodes in the cluster. Ensure the DSE account is the folder owner and has read/write/execute (700) permissions.

This directory is not used for KMIP keys.

Default: /etc/dse/conf

config_encryption_active

Enables encryption on sensitive data stored in tables and in configuration files.

  • true - Enable encryption of configuration property values using the specified config_encryption_key_name. When set to true, the configuration values must be encrypted or commented out. See Encrypting configuration file properties.

    include:securing:partial$sec/SecWH-LcmNoDseEncrypt.adoc[]

  • false - Do not enable encryption of configuration property values.

Default: false

config_encryption_key_name

The local encryption key filename or KMIP key URL to use for configuration file property value decryption.

Use dsetool encryptconfigvalue to generate encrypted values for the configuration file properties.

Default: system_key

The default name is not configurable.

KMIP encryption options

Options for KMIP encryption keys and communication between the DataStax Enterprise node and the KMIP key server or key servers. Enables DataStax Enterprise encryption features to use encryption keys that are stored on a server that is not running DataStax Enterprise.

kmip_hosts:
  <your_kmip_groupname>:
    hosts: kmip1.yourdomain.com, kmip2.yourdomain.com
    keystore_path: pathto/kmip/keystore.jks
    keystore_type: jks
    keystore_password: <password>
    truststore_path: pathto/kmip/truststore.jks
    truststore_type: jks
    truststore_password: <password>
    key_cache_millis: (300000)
    timeout: (1000)
kmip_hosts

Configures connections for key servers that support the KMIP protocol.

kmip_groupname

A user-defined name for a group of options to configure a KMIP server or servers, key settings, and certificates. For each KMIP key server or group of KMIP key servers, you must configure options for a <kmip_groupname> section. Using separate key server configuration settings allows use of different key servers to encrypt table data and eliminates the need to enter key server configuration information in Data Definition Language (DDL) statements and other configurations. DDL statements are database schema change commands like CREATE TABLE. Multiple KMIP hosts are supported.

Default: commented out

hosts

A comma-separated list of KMIP hosts (<host>[:<port>]) using the FQDN (Fully Qualified Domain Name). Add KMIP hosts in the intended failover sequence because DSE queries the host in the listed order.

For example, if the host list contains kmip1.yourdomain.com, kmip2.yourdomain.com, DSE tries kmip1.yourdomain.com and then kmip2.yourdomain.com.

keystore_path

The path to a Java keystore created from the KMIP agent PEM files.

Default: /etc/dse/conf/KMIP_keystore.jks

keystore_type

Valid types are JKS, JCEKS, PKCS11, and PKCS12. For file-based keystores, use PKCS12.

Default: JKS

keystore_password

Password used to protect the private key of the key pair.

Default: none

truststore_path

The path to a Java truststore that was created using the KMIP root certificate.

Default: /etc/dse/conf/KMIP_truststore.jks

include:securing:partial$sec/SecWH-serverSsltruststore_type.adoc[]

include:securing:partial$sec/SecWH-truststore_password.adoc[]

key_cache_millis

Milliseconds to locally cache the encryption keys that are read from the KMIP hosts. The longer the encryption keys are cached, the fewer requests to the KMIP key server are made and the longer it takes for changes, like revocation, to propagate to the DSE node. DataStax Enterprise uses concurrent encryption, so multiple threads fetch the secret key from the KMIP key server at the same time. DataStax recommends using the default value.

Default: 300000

timeout

Socket timeout in milliseconds.

Default: 1000

DSE Search index encryption

# solr_encryption_options:
#    decryption_cache_offheap_allocation: true
#    decryption_cache_size_in_mb: 256
solr_encryption_options

Tunes encryption of search indexes.

decryption_cache_offheap_allocation

Allocates shared DSE Search decryption cache off JVM heap.

  • true - Allocate shared DSE Search decryption cache off JVM heap.

  • false - Do not allocate shared DSE Search decryption cache off JVM heap.

Default: true

decryption_cache_size_in_mb

The maximum size of the shared DSE Search decryption cache in megabytes (MB).

Default: 256

DSE In-Memory options

To use DSE In-Memory, specify how much system memory to use for all in-memory tables by fraction or size.

# max_memory_to_lock_fraction: 0.20
# max_memory_to_lock_mb: 10240
max_memory_to_lock_fraction

A fraction of the system memory. For example, 0.20 allows use up to 20% of system memory. This setting is ignored if max_memory_to_lock_mb is set to a non-zero value.

Default: 0.20

max_memory_to_lock_mb

Maximum amount of memory in megabytes (MB) for DSE In-Memory tables.

  • not set - Use the fraction specified with max_memory_to_lock_fraction.

  • number greater than 0 - Maximum amount of memory in megabytes (MB).

Default: 10240

Node health options

node_health_options:
    refresh_rate_ms: 60000
    uptime_ramp_up_period_seconds: 10800
    dropped_mutation_window_minutes: 30
node_health_options

Node health options are always enabled. Node health is a score-based representation of how healthy a node is to handle search queries. See Collecting node health and indexing status scores.

refresh_rate_ms

How frequently statistics update.

Default: 60000

uptime_ramp_up_period_seconds

The amount of continuous uptime required for the node’s uptime score to advance the node health score from 0 to 1 (full health), assuming there are no recent dropped mutations. The health score is a composite score based on dropped mutations and uptime.

If a node is repairing after a period of downtime, increase the uptime period to the expected repair time.

Default: 10800 (3 hours)

dropped_mutation_window_minutes

The historic time window over which the rate of dropped mutations affects the node health score.

Default: 30

Health-based routing

enable_health_based_routing: true
enable_health_based_routing

Enables node health as a consideration for replication selection for distributed DSE Search queries. Health-based routing enables a trade-off between index consistency and query throughput.

  • true - Consider node health when multiple candidates exist for a particular token range.

  • false - Ignore node health for replication selection. When the primary concern is performance, do not enable health-based routing.

Default: true

Lease metrics

lease_metrics_options:
    enabled: false
    ttl_seconds: 604800
lease_metrics_options

Lease holder statistics help monitor the lease subsystem for automatic management of Job Tracker and Spark Master nodes.

enabled

Enables log entries related to lease holders.

  • true - Enable log entries related to lease holders to help monitor performance of the lease subsystem.

  • false - No not enable log entries.

Default: false

ttl_seconds

Time interval in milliseconds to persist the log of lease holder changes.

Default: 604800

Scheduler settings for DSE Search indexes

To ensure that records with time-to-live (TTL) are purged from search indexes when they expire, the search indexes are periodically checked for expired documents.

ttl_index_rebuild_options:
    fixed_rate_period: 300
    initial_delay: 20
    max_docs_per_batch: 4096
    thread_pool_size: 1
ttl_index_rebuild_options

Configures the schedulers in charge of querying for expired records, removing expired records, and the execution of the checks.

fixed_rate_period

Time interval in seconds to check for expired data in seconds.

Default: 300

initial_delay

The number of seconds to delay the first TTL check to speed up start-up time.

Default: 20

max_docs_per_batch

The maximum number of documents to check and delete per batch by the TTL rebuild thread. All expired documents are deleted from the index during each check. To avoid memory pressure, their unique keys are retrieved and then deletes are issued in batches.

Default: 4096

thread_pool_size

The maximum number of search indexes (cores) that can execute TTL cleanup concurrently. Manages system resource consumption and prevents many search cores from executing simultaneous TTL deletes.

Default: 1

Reindexing of bootstrapped data

async_bootstrap_reindex: false
async_bootstrap_reindex

For DSE Search, configure whether to asynchronously reindex bootstrapped data.

  • true - The node joins the ring immediately after bootstrap and reindexing occurs asynchronously. Do not wait for post-bootstrap reindexing so that the node is not marked down. The dsetool ring command can be used to check the status of the reindexing.

  • false - The node joins the ring after reindexing the bootstrapped data.

Default: false

CQL Solr paging

Specifies the paging behavior.

cql_solr_query_paging: off
cql_solr_query_paging
  • driver - Respects driver paging settings. Uses Solr pagination (cursors) only when the driver uses pagination. Enabled automatically for DSE SearchAnalytics workloads.

  • off - Paging is off. Ignore driver paging settings for CQL queries and use normal Solr paging unless:

    • The current workload is an analytics workload, including SearchAnalytics. SearchAnalytics nodes always use driver paging settings.

    • The cqlsh query parameter paging is set to driver. Even when cql_solr_query_paging: off, paging is dynamically enabled with the "paging":"driver" parameter in JSON queries.

Default: off

Solr CQL query option

Available option for CQL Solr queries.

cql_solr_query_row_timeout: 10000
cql_solr_query_row_timeout

The maximum time in milliseconds to wait for all rows to be read from the database during CQL Solr queries.

Default: 10000 (10 seconds)

DSE Search resource upload limit

solr_resource_upload_limit_mb: 10
solr_resource_upload_limit_mb

Configures the maximum file size of the search index config or schema. Resource files can be uploaded, but the search index config and schema are stored internally in the database after upload.

  • 0 - Disable resource uploading.

  • upload size - The maximum upload size limit in megabytes (MB) for a DSE Search resource file (search index config or schema).

Default: 10

Shard transport

shard_transport_options:
    netty_client_request_timeout: 60000
shard_transport_options

Fault tolerance option for internode communication between DSE Search nodes.

netty_client_request_timeout

Timeout behavior during distributed queries. The internal timeout for all search queries to prevent long running queries. The client request timeout is the maximum cumulative time (in milliseconds) that a distributed search request will wait idly for shard responses.

Default: 60000 (1 minute)

DSE Search indexing

# back_pressure_threshold_per_core: 1024
# flush_max_time_per_core: 5
# load_max_time_per_core: 5
# enable_index_disk_failure_policy: false
# solr_data_dir: /MyDir
# solr_field_cache_enabled: false
# ram_buffer_heap_space_in_mb: 1024
# ram_buffer_offheap_space_in_mb: 1024
back_pressure_threshold_per_core

The maximum number of queued partitions during search index rebuilding and reindexing. This maximum number safeguards against excessive heap use by the indexing queue. If set lower than the number of threads per core (TPC), not all TPC threads can be actively indexing.

Default: 1024

flush_max_time_per_core

The maximum time, in minutes, to wait for the flushing of asynchronous index updates that occurs at DSE Search commit time or at flush time.

Expert knowledge is required to change this value.

Always set the wait time high enough to ensure flushing completes successfully to fully sync DSE Search indexes with the database data. If the wait time is exceeded, index updates are only partially committed and the commit log is not truncated which can undermine data durability.

When a timeout occurs, this node is typically overloaded and cannot flush in a timely manner. Live indexing increases the time to flush asynchronous index updates.

Default: 5

load_max_time_per_core

The maximum time, in minutes, to wait for each DSE Search index to load on startup or create/reload operations. This advanced option should be changed only if exceptions happen during search index loading.

Default: 5

enable_index_disk_failure_policy

Whether to apply the configured disk failure policy if IOExceptions occur during index update operations.

  • true - Apply the configured Cassandra disk failure policy to index write failures

  • false - Do not apply the disk failure policy

Default: false

solr_data_dir

The directory to store index data. See Managing the location of DSE Search data. By default, each DSE Search index is saved in <solr_data_dir>/<keyspace_name>.<table_name> or as specified by the dse.solr.data.dir system property.

Default: A solr.data directory in the cassandra data directory, like /var/lib/cassandra/solr.data

solr_field_cache_enabled

The Apache Lucene® field cache is deprecated. Instead, for fields that are sorted, faceted, or grouped by, set docValues="true" on the field in the search index schema. Then reload the search index and reindex.

Default: false

ram_buffer_heap_space_in_mb

Global Lucene RAM buffer usage threshold for heap to force segment flush. Setting too low can cause a state of constant flushing during periods of ongoing write activity. For near-real-time (NRT) indexing, forced segment flushes also de-schedule pending auto-soft commits to avoid potentially flushing too many small segments.

Default: 1024

ram_buffer_offheap_space_in_mb

Global Lucene RAM buffer usage threshold for offheap to force segment flush. Setting too low can cause a state of constant flushing during periods of ongoing write activity. For NRT, forced segment flushes also de-schedule pending auto-soft commits to avoid potentially flushing too many small segments. When not set, the default is 1024.

Default: 1024

Performance Service

Global Performance Service

Configures the thread pool that is used by most plugins. A dropped task warning is issued when the performance service requests more tasks than performance_max_threads + performance_queue_capacity. When a task is dropped, collected statistics might not be current.

# performance_core_threads: 4
# performance_max_threads: 32
# performance_queue_capacity: 32000
performance_core_threads

Number of background threads used by the performance service under normal conditions.

Default: 4

performance_max_threads

Maximum number of background threads used by the performance service.

Default: 32

performance_queue_capacity

Allowed number of queued tasks in the backlog when the number of performance_max_threads are busy.

Default: 32000

Performance Service

Configures the collection of performance metrics on transactional nodes. Performance metrics are stored in the dse_perf keyspace and can be queried using any CQL-based utility, such as cqlsh or any application using a CQL driver. To temporarily make changes for diagnostics and testing, use the dsetool perf subcommands.

graph_events:
   ttl_seconds: 600
graph_events

Graph event information.

ttl_seconds

Number of seconds a record survives before it is expired.

Default: 600

# cql_slow_log_options:
#   enabled: true
#   threshold: 200.0
#   minimum_samples: 100
#   ttl_seconds: 259200
#   skip_writing_to_db: true
#   num_slowest_queries: 5
cql_slow_log_options

Configures reporting distributed sub-queries for search (query executions on individual shards) that take longer than a specified period of time.

enabled
  • true - Enables log entries for slow queries.

  • false - Does not enable log entries.

Default: true

threshold

The threshold in milliseconds or as a percentile.

  • A value greater than 1 is expressed in time and will log queries that take longer than the specified number of milliseconds. For example, 200.0 sets the threshold at 0.2 seconds.

  • A value of 0 to 1 is expressed as a percentile and will log queries that exceed this percentile. For example, .95 collects information on 5% of the slowest queries.

Default: 200.0

minimum_samples

The initial number of queries before activating the percentile filter.

Default: commented out (100)

ttl_seconds

Number of seconds a slow log record survives before it is expired.

Default: 259200

skip_writing_to_db

Keeps slow queries only in-memory and does not write data to database.

  • true - Keep slow queries only in-memory. Skip writing to database.

  • false - Write slow query information in the node_slow_log table. The threshold must be >= 2000 ms to prevent a high load on the database.

Default: commented out (true)

num_slowest_queries

The number of slow queries to keep in-memory.

Default: commented out (5)

cql_system_info_options:
    enabled: false
    refresh_rate_ms: 10000
cql_system_info_options

Configures collection of system-wide performance information about a cluster.

enabled

Enables collection of system-wide performance information about a cluster.

  • true - Collect metrics.

  • false - Do not collect metrics.

Default: false

refresh_rate_ms

The length of the sampling period in milliseconds; the frequency to update the performance statistics.

Default: 10000 (10 seconds)

resource_level_latency_tracking_options:
    enabled: false
    refresh_rate_ms: 10000
resource_level_latency_tracking_options

Configures collection of object I/O performance statistics.

enabled

Enables collection of object input output performance statistics.

  • true - Collect metrics.

  • false - Do not collect metrics.

Default: false

refresh_rate_ms

The length of the sampling period in milliseconds; the frequency to update the performance statistics.

Default: 10000 (10 seconds)

db_summary_stats_options:
    enabled: false
    refresh_rate_ms: 10000
db_summary_stats_options

Configures collection of summary statistics at the database level.

enabled

Enables collection of database summary performance information.

  • true - Collect metrics.

  • false - Do not collect metrics.

refresh_rate_ms

The length of the sampling period in milliseconds; the frequency to update the performance statistics.

Default: 10000 (10 seconds)

cluster_summary_stats_options:
    enabled: false
    refresh_rate_ms: 10000
cluster_summary_stats_options

Configures collection of statistics at a cluster-wide level.

enabled

Enables collection of statistics at a cluster-wide level.

  • true - Collect metrics.

  • false - Do not collect metrics.

refresh_rate_ms

The length of the sampling period in milliseconds; the frequency to update the performance statistics.

Default: 10000 (10 seconds)

spark_cluster_info_options

Configures collection of data associated with Spark cluster and Spark applications.

spark_cluster_info_options:
    enabled: false
    refresh_rate_ms: 10000
enabled

Enables collection of Spark performance statistics.

  • true - Collect metrics.

  • false - Do not collect metrics.

refresh_rate_ms

The length of the sampling period in milliseconds; the frequency to update the performance statistics.

Default: 10000 (10 seconds)

histogram_data_options:
    enabled: false
    refresh_rate_ms: 10000
    retention_count: 3
histogram_data_options

Histogram data for the dropped mutation metrics are stored in the dropped_messages table in the dse_perf keyspace.

enabled
  • true - Collect metrics.

  • false - Do not collect metrics.

refresh_rate_ms

The length of the sampling period in milliseconds; the frequency to update the performance statistics.

Default: 10000 (10 seconds)

retention_count

Default: 3

user_level_latency_tracking_options:
    enabled: false
    refresh_rate_ms: 10000
    top_stats_limit: 100
    quantiles: false
user_level_latency_tracking_options

User-resource latency tracking settings.

enabled
  • true - Collect metrics.

  • false - Do not collect metrics.

refresh_rate_ms

The length of the sampling period in milliseconds; the frequency to update the performance statistics.

Default: 10000 (10 seconds)

top_stats_limit

The maximum number of individual metrics.

Default: 100

quantiles

Default: false

DSE Search Performance Service

These settings are used by the DataStax Enterprise Performance Service.

solr_slow_sub_query_log_options:
    enabled: false
    ttl_seconds: 604800
    async_writers: 1
    threshold_ms: 3000
solr_slow_sub_query_log_options

See Collecting slow search queries.

enabled
  • true - Collect metrics.

  • false - Do not collect metrics.

ttl_seconds

The number of seconds a record survives before it is expired.

Default: 604800 (about 10 minutes)

async_writers

The number of server threads dedicated to writing in the log. More than one server thread might degrade performance.

Default: 1

threshold_ms

Default: 3000

solr_update_handler_metrics_options:
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
solr_update_handler_metrics_options

Options to collect search index direct update handler statistics over time.

solr_request_handler_metrics_options:
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
solr_request_handler_metrics_options

Options to collect search index request handler statistics over time.

solr_index_stats_options:
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
solr_index_stats_options

Options to record search index statistics over time.

solr_cache_stats_options:
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
solr_cache_stats_options

See Collecting cache statistics.

solr_latency_snapshot_options:
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
solr_latency_snapshot_options

See Collecting Apache Solr performance statistics.

Spark Performance Service

spark_application_info_options:
    enabled: false
    refresh_rate_ms: 10000
    driver:
        sink: false
        connectorSource: false
        jvmSource: false
        stateSource: false
    executor:
        sink: false
        connectorSource: false
        jvmSource: false
spark_application_info_options

Collection of Spark application metrics.

enabled
  • true - Collect metrics.

  • false - Do not collect metrics.

refresh_rate_ms

The length of the sampling period in milliseconds; the frequency to update the performance statistics.

Default: 10000 (10 seconds)

driver

Collection that configures collection of metrics at the Spark Driver.

connectorSource

Enables collecting Spark Cassandra Connector metrics at the Spark Driver.

  • true - Collect metrics.

  • false - Do not collect metrics.

jvmSource

Enables collection of JVM heap and garbage collection (GC) metrics from the Spark Driver.

  • true - Collect metrics.

  • false - Do not collect metrics.

stateSource

Enables collection of application state metrics at the Spark Driver.

  • true - Collect metrics.

  • false - Do not collect metrics.

executor

Configures collection of metrics at Spark executors.

sink

Enables collecting metrics collected at Spark executors.

  • true - Collect metrics.

  • false - Do not collect metrics.

Default: false

connectorSource

Enables collection of Spark Cassandra Connector metrics at Spark executors.

  • true - Collect metrics.

  • false - Do not collect metrics.

jvmSource

Enables collection of JVM heap and GC metrics at Spark executors.

  • true - Collect metrics.

  • false - Do not collect metrics.

DSE Analytics

Spark resource options

spark_shared_secret_bit_length: 256
spark_security_enabled: false
spark_security_encryption_enabled: false

spark_daemon_readiness_assertion_interval: 1000

resource_manager_options:
   worker_options:
       cores_total: 0.7
       memory_total: 0.6

       workpools:
          - name: alwayson_sql
          cores: 0.25
          memory: 0.25
spark_shared_secret_bit_length

The length of a shared secret used to authenticate Spark components and encrypt the connections between them. This value is not the strength of the cipher for encrypting connections.

Default: 256

spark_security_enabled

When DSE authentication is enabled with authentication_options, Spark security is enabled regardless of this setting.

Default: false

spark_security_encryption_enabled

When DSE authentication is enabled with authentication_options, Spark security encryption is enabled regardless of this setting.

Configure encryption between the Spark processes and DSE with client-to-node encryption in cassandra.yaml.

Default: false

spark_daemon_readiness_assertion_interval

Time interval in milliseconds between subsequent retries by the Spark plugin for Spark Master and Worker readiness to start.

Default: 1000

resource_manager_options

Controls the physical resources used by Spark applications on this node. Optionally add named workpools with specific dedicated resources. See Core management.

worker_options

Configures the amount of system resources that are made available to the Spark Worker.

cores_total

The number of total system cores available to Spark.

The SPARK_WORKER_TOTAL_CORES environment variables takes precedence over this setting.

The lowest value that you can assign to Spark Worker cores is 1 core. If the results are lower, no exception is thrown and the values are automatically limited.

Setting cores_total or a workpool’s cores to 1.0 is a decimal value, meaning 100% of the available cores will be reserved. Setting cores_total or cores to 1 (no decimal point) is an explicit value, and one core will be reserved.

Default: 0.7

memory_total

The amount of total system memory available to Spark.

  • absolute value - Use standard suffixes like M for megabyte and G for gigabyte. For example, 12G.

  • decimal value - Maximum fraction of system memory to give all executors for all applications running on a particular node. For example, 0.8.

    When the value is expressed as a decimal, the available resources are calculated in the following way:

    Spark Worker memory = memory_total x (total system memory - memory assigned to DataStax Enterprise)

    The lowest values that you can assign to Spark Worker memory is 64 MB. If the results are lower, no exception is thrown and the values are automatically limited.

The SPARK_WORKER_TOTAL_MEMORY environment variables takes precedence over this setting.

Default: 0.6

workpools

A collection of named workpools that can use a portion of the total resources defined under worker_options.

A default workpool named default is used if no workpools are defined in this section. If workpools are defined, the resources allocated to the workpools are taken from the total amount, with the remaining resources available to the default workpool.

The total amount of resources defined in the workpools section must not exceed the resources available to Spark in worker_options.

name

The name of the workpool. A workpool named alwayson_sql is created by default for AlwaysOn SQL. By default, the alwayson_sql workpool is configured to use 25% of the resources available to Spark.

Default: alwayson_sql

cores

The number of system cores to use in this workpool expressed as an absolute value or a decimal value. This option follows the same rules as cores_total.

memory

The amount of memory to use in this workpool expressed as either an absolute value or a decimal value. This option follows the same rules as memory_total.

Spark encryption options

spark_ui_options:
    encryption: inherit
    encryption_options:
        enabled: false
        keystore: resources/dse/conf/.ui-keystore
        keystore_password: cassandra
        require_client_auth: false
        truststore: .truststore
        truststore_password: cassandra
        # Advanced settings
        # protocol: TLS
        # algorithm: SunX509
        # keystore_type: JKS
        # truststore_type: JKS
        # cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
spark_ui_options

Configures encryption for Spark Master and Spark Worker UIs. These options apply only to Spark daemon UIs, and do not apply to user applications even when the user applications are run in cluster mode.

To set permissions on roles to allow Spark applications to be started, stopped, managed, and viewed, see Using authorization with Spark

encryption

The source for SSL settings.

Default: inherit

encryption_options

When encryption: custom, configures encryption for HTTPS of Spark Master and Worker UI.

enabled

Enables Spark encryption for Spark client-to-Spark cluster and Spark internode communication.

Default: false

keystore

The keystore for Spark encryption keys.

The relative filepath is the base Spark configuration directory that is defined by the SPARK_CONF_DIR environment variable. The default Spark configuration directory is resources/spark/conf.

Default: resources/dse/conf/.ui-keystore

keystore_password

The password to access the keystore.

Default: cassandra

require_client_auth

Enables custom truststore for client authentication.

  • true - Require custom truststore for client authentication.

  • false - Do not require custom truststore.

Default: false

truststore

The filepath to the truststore for Spark encryption keys if require_client_auth: true.

The relative filepath is the base Spark configuration directory that is defined by the SPARK_CONF_DIR environment variable. The default Spark configuration directory is resources/spark/conf.

Default: resources/dse/conf/.ui-truststore

truststore_password

The password to access the truststore.

Default: cassandra

protocol

The Transport Layer Security (TLS) authentication protocol. The TLS protocol must be supported by JVM and Spark. TLS 1.2 is the most common JVM default.

Default: JVM default

algorithm

The key manager algorithm.

Default: SunX509

keystore_type

Valid types are JKS, JCEKS, PKCS11, and PKCS12. For file-based keystores, use PKCS12.

Default: JKS

truststore_type

Valid types are JKS, JCEKS, and PKCS12.

Default: commented out (JKS)

cipher_suites

A comma-separated list of cipher suites for Spark encryption. Enclose the list in square brackets.

  • TLS_RSA_WITH_AES_128_CBC_SHA

  • TLS_RSA_WITH_AES_256_CBC_SHA

  • TLS_DHE_RSA_WITH_AES_128_CBC_SHA

  • TLS_DHE_RSA_WITH_AES_256_CBC_SHA

  • TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA

  • TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA

Default: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]

Starting Spark drivers and executors

spark_process_runner:
    runner_type: default
    run_as_runner_options:
        user_slots:
            - slot1
            - slot2
spark_process_runner:

Configures how Spark driver and executor processes are created and managed. See Running Spark processes as separate users.

runner_type
  • default - Use the default runner type.

  • run_as - Spark applications run as a different OS user than the DSE service user.

run_as_runner_options

When runner_type: run_as, Spark applications run as a different OS user than the DSE service user.

user_slots

The list slot users to separate Spark processes users from the DSE service user.

Default: slot1, slot2

AlwaysOn SQL

Properties to enable and configure AlwaysOn SQL on analytics nodes.

# AlwaysOn SQL options
# alwayson_sql_options:
#     enabled: false
#     thrift_port: 10000
#     web_ui_port: 9077
#     reserve_port_wait_time_ms: 100
#     alwayson_sql_status_check_wait_time_ms: 500
#     workpool: alwayson_sql
#     log_dsefs_dir: /spark/log/alwayson_sql
#     auth_user: alwayson_sql
#     runner_max_errors: 10
#     heartbeat_update_interval_seconds: 30
alwayson_sql_options

Configures the AlwaysOn SQL server.

enabled

Enables AlwaysOn SQL for this node.

  • true - Enable AlwaysOn SQL for this node. The node must be an analytics node. Set workpools in Spark resource_manager_options.

  • false - Do not enable AlwaysOn SQL for this node.

Default: false

thrift_port

The Thrift port on which AlwaysOn SQL listens.

Default: 10000

web_ui_port

The port on which the AlwaysOn SQL web UI is available.

Default: 9077

reserve_port_wait_time_ms

The wait time in milliseconds to reserve the thrift_port if it is not available.

Default: 100

alwayson_sql_status_check_wait_time_ms

The time in milliseconds to wait for a health check status of the AlwaysOn SQL server.

Default: 500

workpool

The named workpool used by AlwaysOn SQL.

Default: alwayson_sql

log_dsefs_dir

Location in DSEFS of the AlwaysOn SQL log files.

Default: /spark/log/alwayson_sql

auth_user

The role to use for internal communication by AlwaysOn SQL if authentication is enabled. Custom roles must be created with login=true.

Default: alwayson_sql

runner_max_errors

The maximum number of errors that can occur during AlwaysOn SQL service runner thread runs before stopping the service. A service stop requires a manual restart.

Default: 10

heartbeat_update_interval_seconds

The time interval to update heartbeat of AlwaysOn SQL. If heartbeat is not updated for more than three times the interval, AlwaysOn SQL automatically restarts.

Default: 30

DSE File System (DSEFS)

Properties to enable and configure the DSE File System (DSEFS).

DSEFS replaced the Cassandra File System (CFS). DSE 6.8 does not support CFS.

# dsefs_options:
#    enabled:
#    keyspace_name: dsefs
#    work_dir: /var/lib/dsefs
#    public_port: 5598
#    private_port: 5599
#    data_directories:
#      - dir: /var/lib/dsefs/data
#        storage_weight: 1.0
#        min_free_space: 268435456
dsefs_options

Configures DSEFS. See Configuring DSEFS.

enabled

Enables DSEFS.

  • true - Enables DSEFS on this node, regardless of the workload.

  • false - Disables DSEFS on this node, regardless of the workload.

  • blank or commented out (#) - DSEFS starts only if the node is configured to run analytics workloads.

Default:

keyspace_name

The keyspace where the DSEFS metadata is stored. You can optionally configure multiple DSEFS file systems within a single datacenter by specifying different keyspace names for each cluster.

Default: dsefs

work_dir

The local directory for storing the local node metadata, including the node identifier. The volume of data stored in this directory is nominal and does not require configuration for throughput, latency, or capacity. This directory must not be shared by DSEFS nodes.

Default: /var/lib/dsefs

public_port

The public port on which DSEFS listens for clients.

DataStax recommends that all nodes in the cluster have the same value. Firewalls must open this port to trusted clients. The service on this port is bound to the native_transport_address.

Default: 5598

private_port

The private port for DSEFS internode communication.

Do not open this port to firewalls; this private port must be not visible from outside of the cluster.`

Default: 5599

data_directories

One or more data locations where the DSEFS data is stored.

- dir

Mandatory attribute to identify the set of directories. DataStax recommends segregating these data directories on physical devices that are different from the devices that are used for DataStax Enterprise. Using multiple directories on JBOD improves performance and capacity.

Default: /var/lib/dsefs/data

storage_weight

Weighting factor for this location. Determines how much data to place in this directory, relative to other directories in the cluster. This soft constraint determines how DSEFS distributes the data. For example, a directory with a value of 3.0 receives about three times more data than a directory with a value of 1.0.

Default: 1.0

min_free_space

The reserved space, in bytes, to not use for storing file data blocks. You can use a unit of measure suffix to specify other size units. For example: terabyte (1 TB), gigabyte (10 GB), and megabyte (5000 MB).

Default: 268435456

Advanced properties for DSEFS

#     service_startup_timeout_ms: 60000
#     service_close_timeout_ms: 600000
#     server_close_timeout_ms: 2147483647 # Integer.MAX_VALUE
#     compression_frame_max_size: 1048576
#     query_cache_size: 2048
#     query_cache_expire_after_ms: 2000
#     gossip_options:
  #   round_delay_ms: 2000
  #   startup_delay_ms: 5000
  #   shutdown_delay_ms: 10000
# rest_options:
  #   request_timeout_ms: 330000
  #   connection_open_timeout_ms: 55000
  #   client_close_timeout_ms: 60000
  #   server_request_timeout_ms: 300000
  #   idle_connection_timeout_ms: 60000
  #   internode_idle_connection_timeout_ms: 120000
  #   core_max_concurrent_connections_per_host: 8
# transaction_options:
  #   transaction_timeout_ms: 3000
  #   conflict_retry_delay_ms: 200
  #   conflict_retry_count: 40
  #   execution_retry_delay_ms: 1000
  #   execution_retry_count: 3
#     block_allocator_options:
#         overflow_margin_mb: 1024
#         overflow_factor: 1.05
service_startup_timeout_ms

Wait time in milliseconds before the DSEFS server times out while waiting for services to bootstrap.

Default: 60000

service_close_timeout_ms

Wait time in milliseconds before the DSEFS server times out while waiting for services to close.

Default: 60000

server_close_timeout_ms

Wait time in milliseconds that the DSEFS server waits during shutdown before closing all pending connections.

Default: 2147483647

compression_frame_max_size

The maximum accepted size of a compression frame defined during file upload.

Default: 1048576

query_cache_size

Maximum number of elements in a single DSEFS Server query cache.

Default: 2048

query_cache_expire_after_ms

The time to retain the DSEFS Server query cache element in cache. The cache element expires when this time is exceeded.

Default: 2000

gossip options

Configures DSEFS gossip rounds.

round_delay_ms

The delay in milliseconds between gossip rounds.

Default: 2000

startup_delay_ms

The delay in milliseconds between registering the location and reading back all other locations from the database.

Default: 5000

shutdown_delay_ms

The delay time in milliseconds between announcing shutdown and shutting down the node.

Default: 30000

rest_options

Configures DSEFS rest times.

request_timeout_ms

The time in milliseconds that the client waits for a response that corresponds to a given request.

Default: 330000

connection_open_timeout_ms

The time in milliseconds that the client waits to establish a new connection.

Default: 55000

client_close_timeout_ms

The time in milliseconds that the client waits for pending transfer to complete before closing a connection.

Default: 60000

server_request_timeout_ms

The time in milliseconds to wait for the server rest call to complete.

Default: 300000

idle_connection_timeout_ms

The time in milliseconds for RestClient to wait before closing an idle connection. If RestClient does not close connection after timeout, the connection is closed after 2 x this wait time.

  • time - Wait time to close idle connection.

  • 0 - Disable closing idle connections.

Default: 60000

internode_idle_connection_timeout_ms

Wait time in milliseconds before closing idle internode connection. The internode connections are primarily used to exchange data during replication. Do not set lower than the default value for heavily utilized DSEFS clusters.

Default: 0

core_max_concurrent_connections_per_host

Maximum number of connections to a given host per single CPU core. DSEFS keeps a connection pool for each CPU core.

Default: 8

transaction_options

Configures DSEFS transaction times.

transaction_timeout_ms

Transaction run time in milliseconds before the transaction is considered for timeout and rollback.

Default: 3000

conflict_retry_delay_ms

Wait time in milliseconds before retrying a transaction that was ended due to a conflict.

Default: 200

conflict_retry_count

The number of times to retry a transaction before giving up.

Default: 40

execution_retry_delay_ms

Wait time in milliseconds before retrying a failed transaction payload execution.

Default: 1000

execution_retry_count

The number of payload execution retries before signaling the error to the application.

Default: 3

block_allocator_options

Controls how much additional data can be placed on the local coordinator before the local node overflows to the other nodes. The trade-off is between data locality of writes and balancing the cluster. A local node is preferred for a new block allocation, if:

used_size_on_the_local_node < average_used_size_per_node x overflow_factor + overflow_margin
overflow_margin_mb
  • margin_size - Overflow margin size in megabytes.

  • 0 - Disable block allocation overflow Default: 1024

    • overflow_factor

  • factor - Overflow factor on an exponential scale.

  • 1.0 - Disable block allocation overflow

Default: 1.05

DSE Metrics Collector

# insights_options:
  # data_dir: /var/lib/cassandra/insights_data
  # log_dir: /var/log/cassandra/

Uncomment these options only to change the default directories.

insights_options

Options for DSE Metrics Collector.

data_dir

Directory to store collected metrics.

When data_dir is not explicitly set, the insights_data directory is stored in the same parent directory as the commitlog_directory as defined in cassandra.yaml. If the commitlog_directory uses the package default of /var/lib/cassandra/commitlog, data_dir will default to /var/lib/cassandra/insights_data.

Default: /var/lib/cassandra/insights_data

log_dir

Directory to store logs for collected metrics. The log file is dse-collectd.log. The file with the collectd PID is dse-collectd.pid.

Default: /var/log/cassandra/

Audit logging for database activities

Track database activity using the audit log feature. To get the maximum information from data auditing, enable data auditing on every node.

audit_logging_options:
    enabled: false
    logger: SLF4JAuditWriter
#     included_categories:
#     excluded_categories:
#
#     included_keyspaces:
#     excluded_keyspaces:
#
#     included_roles:
#     excluded_roles:
audit_logging_options

Configures database activity logging.

enabled

Enables database activity auditing.

  • true - Enable database activity auditing.

  • false - Disable database activity auditing.

Default: false

logger

The logger to use for recording events:

  • SLF4JAuditWriter - Capture events in a log file.

  • CassandraAuditWriter - Capture events in the dse_audit.audit_log table.

Configure logging level, sensitive data masking, and log file name/location in the logback.xml file.

Default: SLF4JAuditWriter

included_categories

Comma-separated list of event categories that are captured.

  • QUERY - Data retrieval events.

  • DML - (Data manipulation language) Data change events.

  • DDL - (Data definition language) Database schema change events.

  • DCL - (Data change language) Role and permission management events.

  • AUTH - (Authentication) Login and authorization related events.

  • ERROR - Failed requests.

  • UNKNOWN - Events where the category and type are both UNKNOWN.

Event categories that are not listed are not captured.

Use either included_categories or excluded_categories but not both. When specifying included categories leave excluded_categories blank or commented out.

Default: none (include all categories)

excluded_categories

Comma-separated list of categories to ignore, where the categories are:

  • QUERY - Data retrieval events.

  • DML - (Data manipulation language) Data change events.

  • DDL - (Data definition language) Database schema change events.

  • DCL - (Data change language) Role and permission management events.

  • AUTH - (Authentication) Login and authorization related events.

  • ERROR - Failed requests.

  • UNKNOWN - Events where the category and type are both UNKNOWN.

Events in all other categories are logged.

Use either included_categories or excluded_categories but not both.

Default: exclude no categories

included_keyspaces

Comma-separated list of keyspaces for which events are logged. You can also use a regular expression to filter on keyspace name.

DSE supports using either included_keyspaces or excluded_keyspaces but not both.

Default: include all keyspaces

excluded_keyspaces

Comma-separated list of keyspaces to exclude. You can also use a regular expression to filter on keyspace name.

Default: exclude no keyspaces

included_roles

Comma-separated list of the roles for which events are logged.

DSE supports using either included_roles or excluded_roles but not both.

Default: include all roles

excluded_roles

The roles for which events are not logged. Specify a comma separated list role names.

Default: exclude no roles

Cassandra audit writer options

retention_time: 0
cassandra_audit_writer_options:
    mode: sync
    batch_size: 50
    flush_time: 250
    queue_size: 30000
    write_consistency: QUORUM
    # dropped_event_log: /var/log/cassandra/dropped_audit_events.log
    # day_partition_millis: 3600000
retention_time

The number of hours to retain audit events by supporting loggers for the CassandraAuditWriter.

  • hours - The number of hours to retain audit events.

  • 0 - Retain events forever.

Default: 0

cassandra_audit_writer_options

Audit writer options.

mode

The mode the writer runs in.

  • sync - A query is not executed until the audit event is successfully written.

  • async - Audit events are queued for writing to the audit table, but are not necessarily logged before the query executes. A pool of writer threads consumes the audit events from the queue, and writes them to the audit table in batch queries.

While async substantially improves performance under load, if there is a failure between when a query is executed, and its audit event is written to the table, the audit table might be missing entries for queries that were executed.

Default: sync

batch_size

Available only when mode: async. Must be greater than 0.

The maximum number of events the writer dequeues before writing them out to the table. If warnings in the logs reveal that batches are too large, decrease this value or increase the value of batch_size_warn_threshold_in_kb in cassandra.yaml.

Default: 50

flush_time

Available only when mode: async.

The maximum amount of time in milliseconds before an event is removed from the queue by a writer before being written out. This flush time prevents events from waiting too long before being written to the table when there are not a lot of queries happening.

Default: 500

queue_size

The size of the queue feeding the asynchronous audit log writer threads.

  • Number of events - When there are more events being produced than the writers can write out, the queue fills up, and newer queries are blocked until there is space on the queue.

  • 0 - The queue size is unbounded, which can lead to resource exhaustion under heavy query load.

Default: 30000

write_consistency

The consistency level that is used to write audit events.

Default: QUORUM

dropped_event_log

The directory to store the log file that reports dropped events.

Default: /var/log/cassandra/dropped_audit_events.log

day_partition_millis

The time interval in milliseconds between changing nodes to spread audit log information across multiple nodes. For example, to change the target node every 12 hours, specify 43200000 milliseconds.

Default: 3600000 (1 hour)

DSE Tiered Storage

One or more disk configurations for DSE Tiered Storage. Specify multiple disk configurations as unnamed tiers by a collection of paths that are defined in priority order, with the fastest storage media in the top tier. With heterogeneous storage configurations across the cluster, specify each disk configuration with <config_name>:<config_settings>, and then use this configuration in CREATE TABLE or ALTER TABLE statements.

DSE Tiered Storage does not change compaction strategies. To manage compression and compaction options, use the compaction option. See Modifying compression and compaction.

# tiered_storage_options:
#     strategy1:
#         tiers:
#             - paths:
#                 - /mnt1
#                 - /mnt2
#             - paths: [ /mnt3, /mnt4 ]
#             - paths: [ /mnt5, /mnt6 ]
#
#         local_options:
#             k1: v1
#             k2: v2
#
#     'another strategy':
#         tiers: [ paths: [ /mnt1 ] ]
tiered_storage_options

Configures the smart movement of data across different types of storage media so that data is matched to the most suitable drive type, according to the required performance and cost characteristics.

strategy1

The first disk configuration strategy. Create a strategy2, strategy3, and so on. In this example, strategy1 is the configurable name of the tiered storage configuration strategy.

tiers

The unnamed tiers in this section configure a storage tier with the paths and filepaths that define the priority order.

local_options

Local configuration options overwrite the tiered storage settings for the table schema in the local dse.yaml file. See Testing DSE Tiered Storage configurations.

- paths

The section of filepaths that define the data directories for this tier of the disk configuration. List the fastest storage media first. These paths are used to store only data that is configured to use tiered storage and are independent of any settings in the cassandra.yaml file.

- /filepath

The filepaths that define the data directories for this tier of the disk configuration.

DSE Advanced Replication

Configure replicating data from remote clusters to central data hubs.

# advanced_replication_options:
  # enabled: false
  # conf_driver_password_encryption_enabled: false
  # advanced_replication_directory: /var/lib/cassandra/advrep
  # security_base_path: /<base>/<path>/<to>/<advrep>/<security>/<files>/
advanced_replication_options

Configure DSE Advanced Replication.

enabled

Enables an edge node to collect data in the replication log.

Default: false

conf_driver_password_encryption_enabled

Enables encryption of driver passwords. See Encrypting configuration file properties.

Default: false

advanced_replication_directory

The directory for storing advanced replication CDC logs. The replication_logs directory will be created in the specified directory.

Default: /var/lib/cassandra/advrep

security_base_path

The base path to prepend to paths in the Advanced Replication configuration locations, including locations to SSL keystore, SSL truststore, and so on.

Default: /base/path/to/advrep/security/files/

Internode messaging

internode_messaging_options:
  port: 8609
  # frame_length_in_mb: 256
  # server_acceptor_threads: 8
  # server_worker_threads: 16
  # client_max_connections: 100
  # client_worker_threads: 16
  # handshake_timeout_seconds: 10
  # client_request_timeout_seconds: 60
internode_messaging_options

Configures the internal messaging service used by several components of DataStax Enterprise. All internode messaging requests use this service.

port

The mandatory port for the internode messaging service.

Default: 8609

frame_length_in_mb

Maximum message frame length.

Default: 256

server_acceptor_threads

The number of server acceptor threads.

Default: The number of available processors

server_worker_threads

The number of server worker threads.

Default: The default is the number of available processors x 8

client_max_connections

The maximum number of client connections.

Default: 100

client_worker_threads

The number of client worker threads.

Default: The default is the number of available processors x 8

handshake_timeout_seconds

Timeout for communication handshake process.

Default: 10

client_request_timeout_seconds

Timeout for non-query search requests like core creation and distributed deletes.

Default: 60

DSE Multi-Instance

server_id

Unique generated ID of the physical server in DSE Multi-Instance /etc/<dse-nodeId>/dse.yaml files. You can change server_id when the MAC address is not unique, such as a virtualized server where the host’s physical MAC is cloned.

Default: the media access control address (MAC address) of the physical server

DataStax Graph (DSG)

DSG Gremlin Server

The Gremlin Server is configured using Apache TinkerPop specifications.

# gremlin_server:
              # port: 8182
              # threadPoolWorker: 2
              # gremlinPool: 0
              #     scriptEngines:
              #         gremlin-groovy:
              #             config:
              #                sandbox_enabled: false
              #                sandbox_rules:
              #                     whitelist_packages:
              #                         - package.name
              #                     whitelist_types:
              #                         - fully.qualified.type.name
              #                     whitelist_supers:
              #                         - fully.qualified.class.name
              #                     blacklist_packages:
              #                         - package.name
              #                     blacklist_supers:
              #                         - fully.qualified.class.name
gremlin_server

The top-level configurations in Gremlin Server.

port

The available communications port for Gremlin Server.

Default: 8182

threadPoolWorker

The number of worker threads that handle non-blocking read and write (requests and responses) on the Gremlin Server channel, including routing requests to the right server operations, handling scheduled jobs on the server, and writing serialized responses back to the client.

Default: 2

gremlinPool

This pool represents the workers available to handle blocking operations in Gremlin Server.

  • 0 - the value of the JVM property cassandra.available_processors, if that property is set

  • positive number - The number of Gremlin threads available to execute actual scripts in a ScriptEngine.

Default: the value of Runtime.getRuntime().availableProcessors()

scriptEngines

Configures gremlin server scripts.

gremlin-groovy

Configures for gremlin-groovy scripts.

sandbox_enabled

Configures gremlim groovy sandbox.

  • true - Enable the gremlim groovy sandbox.

  • false - Disable the gremlin groovy sandbox entirely.

Default: true

sandbox_rules

Configures sandbox rules.

whitelist_packages

List of packages, one package per line, to whitelist.

-.<package.name>

The fully qualified package name.

whitelist_types

List of types, one type per line, to whitelist.

-<fully.qualified.type.name>

The fully qualified type name.

whitelist_supers

List of super classes, one class per line, to whitelist.

-<fully.qualified.class.name>

The fully qualified class name.

blacklist_packages

List of packages, one package per line, to blacklist.

-<package.name>

The fully qualified package name.

blacklist_supers

List of super classes, one class per line, to blacklist. Retain the hyphen before the fully qualified class name.

-<fully.qualified.class.name>

The fully qualified class name.

DSG system-level

# graph:
    # analytic_evaluation_timeout_in_minutes: 10080
    # realtime_evaluation_timeout_in_seconds: 30
    # schema_agreement_timeout_in_ms: 10000
    # system_evaluation_timeout_in_seconds: 180
    # adjacency_cache_size_in_mb: 128
    # index_cache_size_in_mb: 128
    # max_query_params: 16
graph

System-level configuration options and options that are shared between graph instances. Add an option if it is not present in the provided dse.yaml file.

Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.

analytic_evaluation_timeout_in_minutes

Maximum time to wait for an OLAP analytic (Spark) traversal to evaluate.

Default: 10080 (168 hours)

realtime_evaluation_timeout_in_seconds

Maximum time to wait for an OLTP real-time traversal to evaluate.

Default: 30

schema_agreement_timeout_in_ms

Maximum time to wait for the database to agree on schema versions before timing out.

Default: 10000

system_evaluation_timeout_in_seconds

Maximum time to wait for a graph system-based request to execute, like creating a new graph.

Default: 180 (3 minutes)

adjacency_cache_size_in_mb

The amount of ram to allocate to each graph’s adjacency (edge and property) cache.

Default: 128

index_cache_size_in_mb

The amount of ram to allocate to the index cache.

Default: 128

max_query_params

The maximum number of parameters that can be passed on a graph query request for TinkerPop drivers and drivers using the Cassandra native protocol. Passing very large numbers of parameters on requests is an anti-pattern, because the script evaluation time increases proportionally. DataStax recommends reducing the number of parameters to speed up script compilation times. Before you increase this value, consider alternate methods for parameterizing scripts, like passing a single map. If the graph query request requires many arguments, pass a list.

Default: 16

Advanced DSG system-level options

Some graph options in earlier versions of DSE are no longer required. The default settings from the earlier versions of dse.yaml/ are preserved. These advanced settings were removed from dse.yaml, although expert users can manually enter the option to change the default setting.

Generally, the default value is appropriate and does not need adjusting. DataStax recommends contacting the DataStax Services team before changing this value.

graph:
  adjacency_cache_clean_rate: 1024
  adjacency_cache_max_entry_size_in_mb: 0
  adjacency_cache_size_in_mb: 128
  gremlin_server_enabled: true
  index_cache_clean_rate: 1024
  index_cache_max_entry_size_in_mb: 0
  schema_mode: Production
  window_size: 100000
adjacency_cache_clean_rate

The number of stale rows per second to clean from each graph’s adjacency cache.

Default: 1024

adjacency_cache_max_entry_size_in_mb

The maximum entry size in each graph’s adjacency cache. When set to zero, the default is calculated based on the cache size and the number of CPUs. Entries that exceed this size are quietly dropped by the cache without producing an explicit error or log message.

Default: 0

adjacency_cache_size_in_mb

The amount of RAM to allocate to each graph’s adjacency (edge and property) cache.

Default: 128

gremlin_server_enabled

Enables Gremlin Server.

  • true - Enable Gremlin Server.

  • false - Do not enable Gremlin Server.

Default: true

index_cache_clean_rate

The number of stale entries per second to clean from the adjacency cache.

Default: 1024

index_cache_max_entry_size_in_mb

Entries that exceed this size are quietly dropped by the cache without producing an explicit error or log message.

  • 0 - Maximum size is based on the cache size and the number of CPUs.

  • postive integer - The maximum entry size in the index adjacency cache.

Default: 0

schema_mode

Controls the way that the schemas are handled.

  • Production = Schema must be created before data insertion. Schema cannot be changed after data is inserted. Full graph scans are disallowed unless the option graph.allow_scan is changed to TRUE.

  • Development = No schema is required to write data to a graph. Schema can be changed after data is inserted. If this option is not present, manually enter it. Full graph scans are allowed unless the option graph.allow_scan is changed to FALSE.

Default: Production

window_size

The number of samples to keep when aggregating log events. Only a small subset of graph’s log events use this system. Modifying this setting is rarely necessary or helpful.

Default: 100000

Advanced DSG id assignment and partitioning strategy options

Some graph options in earlier versions of DSE are no longer required. The default settings from the earlier versions of dse.yaml are preserved. These advanced settings vertex ID assignment and partitioning strategy options were removed from dse.yaml, although expert users can manually enter the option to change the default setting.

Generally, the default value is appropriate and does not need adjusting. DataStax recommends contacting the DataStax Services team before changing this value.

ids:
    block_renew: 0.8
    community_reuse: 28
    consistency_mode: GLOBAL
    # datacenter_id: <integer> unique per DC when consistency_mode: DC_LOCAL
    id_hash_modulus: 20
    member_block_size: 512
ids

DSG configuration options for standard vertex ID assignment and partitioning strategies.

block_renew

The graph standard vertex ID allocator operates on blocks of contiguous IDs. Each block is allocated using a database lightweight transaction that requires coordination latency. To hide the cost of allocating a standard ID block, the allocator begins asynchronously buffering a replacement block whenever a current block is nearly empty. This block_renew parameter defines "nearly empty" as a floating point number between 0 and 1. The value is how much of a standard ID block can be used before graph starts asynchronously allocating its replacement. This setting has no effect on custom IDs. Value must be between 0 and 1.

Default: 0.8

community_reuse

For graphs using standard vertex IDs, if a transaction creates multiple vertices, the allocator attempts to assign vertex IDs that colocate vertices on the same database replicas. If an especially large vertex cohort is created, the allocator chunks the vertex creation and assigns a random target location to avoid load hotspotting. This setting controls the vertex chunk size and has no effect on custom IDs.

Default: 28

consistency_mode

Must be set to DC_LOCAL or GLOBAL.

  • DC_LOCAL - The node uses LOCAL_QUORUM when allocating an ID for a graph vertex. The datacenter_id option must be correctly configured on every node in the cluster.

  • GLOBAL - The node uses QUORUM when allocating an ID for a graph vertex. The datacenter_id option is ignored. This option must have the same value on every node in the cluster and can be changed only when the entire cluster is stopped. This setting has no effect on custom IDs.

Default: GLOBAL

datacenter_id

Applies only when consistency_mode is DC_LOCAL. Set to an arbitrary value between 1 and 127, inclusive. This setting has no effect on custom IDs.

Each datacenter in the cluster must have a unique datacenter_id. Violating this constraint will corrupt the graph database without warning.

id_hash_modulus

An integer between 1 and 2^24 (both inclusive) that affects maximum ID capacity and the maximum storage space used by ID allocations. Lower values reduce the storage space consumed and the lightweight transaction overhead imposed at startup. Lower values also reduce the total number of IDs that can be allocated over the life of a graph, because this parameter is proportional to the allocatable ID space. However, the proportion coefficient is Long.MAX_VALUE (2^63-1), so ID headroom should be sufficient, practically speaking, even if this is set to 1. This setting has no effect on custom IDs.

Default: 20

member_block_size

The graph standard vertex ID allocator claims uniformly-sized blocks of contiguous IDs using lightweight transactions on the database. This setting controls the size of each block. This setting has no effect on custom IDs.

Default: 512

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com