dse.yaml configuration file
The DataStax Enterprise configuration file for security, DSE Search, DSE Graph, and DSE Analytics.
logback.xml
The location of the logback.xml file depends on the type of installation:Package installations | /etc/dse/cassandra/logback.xml |
Tarball installations | installation_location/resources/cassandra/conf/logback.xml |
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
cassandra.yaml
The location of the cassandra.yaml file depends on the type of installation:Package installations | /etc/dse/cassandra/cassandra.yaml |
Tarball installations | installation_location/resources/cassandra/conf/cassandra.yaml |
Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
The cassandra.yaml file is the primary configuration file for the DataStax Enterprise database.
Syntax
node_health_options
entry, and at least two spaces before the child
settings: node_health_options: refresh_rate_ms: 50000 uptime_ramp_up_period_seconds: 10800 dropped_mutation_window_minutes: 30
Organization
The DataStax Enterprise configuration properties are grouped into the following sections:- Security and authentication options
- DSE In-Memory
- Node health
- Health-based routing
- Lease metrics
- DSE Search options
- DSE Analytics options
- Performance Service options
- DSE Metrics Collector options
- Audit logging
- audit_logging_options
- DSE Tiered Storage
- DSE Advanced Replication
- Inter-node messaging
- DSE Multi-Instance
- DSE Graph options
Security and authentication options
Authentication options
Authentication options for the DSE Authenticator that allows you to use multiple schemes for authentication in a DataStax Enterprise cluster. Additional authenticatorconfiguration is required in cassandra.yaml.# authentication_options: # enabled: false # default_scheme: internal # other_schemes: # - ldap # - kerberos # scheme_permissions: false # transitional_mode: disabled # allow_digest_with_kerberos: true # plain_text_without_ssl: warn
- authentication_options
- Options for the DseAuthenticator to authenticate users when the authenticator option in cassandra.yaml is set to com.datastax.bdp.cassandra.auth.DseAuthenticator. Authenticators other than DseAuthenticator are not supported.
- enabled
- Enables user authentication.
- true - The DseAuthenticator authenticates users.
- false - The DseAuthenticator does not authenticate users and allows all connections.
Default: commented out
false
- default_scheme
- Sets the first scheme to validate a user against when the driver does not request a
specific scheme.
- internal - Plain text authentication using the internal password authentication.
- ldap - Plain text authentication using pass-through LDAP authentication.
- kerberos - GSSAPI authentication using the Kerberos authenticator.
Default: commented out (
internal
) - other_schemes
- List of schemes that are also checked if validation against the first scheme fails and no scheme was specified by the driver. Same scheme names as default_scheme.
- scheme_permissions
- Whether roles need to have permission granted to them in order to use specific
authentication schemes. These permissions can be granted only when the DseAuthorizer
is used. Set to one of the following values:
- true - Use multiple schemes for authentication. Every role requires permissions to a scheme in order to be assigned.
- false - Do not use multiple schemes for authentication. Prevents unintentional role assignment that might occur if user or group names overlap in the authentication service.
Tip: See .When not set, the default is false.Default: commented out (
false
) - allow_digest_with_kerberos
- Controls whether DIGEST-MD5 authentication is also allowed with Kerberos. The
DIGEST-MD5 mechanism is not directly associated with an authentication scheme, but is
used by Kerberos to pass credentials between nodes and jobs.
- true - DIGEST-MD5 authentication is also allowed with Kerberos. In analytics clusters, set to true to use Hadoop inter-node authentication with Hadoop and Spark jobs.
- false - DIGEST-MD5 authentication is not used with Kerberos.
Default: commented out (
true
) - plain_text_without_ssl
- Controls how the DseAuthenticator responds to plain text authentication requests
over unencrypted client connections. Set to one of the following values:
- block - Block the request with an authentication error.
- warn - Log a warning about the request but allow it to continue.
- allow - Allow the request without any warning.
Default: commented out (
warn
) - transitional_mode
- Whether to enable transitional mode for temporary use during authentication setup in
an already established environment. Transitional mode allows access to the database using the
anonymous
role, which has all permissions exceptAUTHORIZE
.- disabled - Transitional mode is disabled. All connections must provide valid credentials and map to a login-enabled role.
- permissive - Only super users are authenticated and logged in. All other authentication attempts are logged in as the anonymous user.
- normal - Allow all connections that provide credentials. Maps all
authenticated users to their role AND maps all other connections to
anonymous
. - strict - Allow only authenticated connections that map to a login-enabled role
OR connections that provide a blank username and password as
anonymous
.
Important: Credentials are required for all connections after authentication is enabled; use a blank username and password to login with anonymous role in transitional mode.Default: commented out (
disabled
)
Role management options
#role_management_options: # mode: internal # stats: false
- role_management_options
- Options for the DSE Role Manager. To enable role manager, set:
- authorization_options enabled to true
- role_manager in
cassandra.yaml to
com.datastax.bdp.cassandra.auth.DseRoleManager
Tip: See .When scheme_permissions is enabled, all roles must have permission to execute on the authentication scheme, see . - mode
- Set to one of the following values:
- internal - Scheme that manages roles per individual user in the internal database. Allows nesting roles for permission management.
- ldap - Scheme that assigns roles by looking up the user name in LDAP and mapping the group attribute (ldap_options) to an internal role name. To configure an LDAP scheme, complete the steps in .
Attention: Internal role management allows nesting roles for permission management; when using LDAP mode role, nesting is disabled. UsingGRANT role_name TO role_name
results in an error.Default: commented out (
internal
) - stats
- Set to true, to enable logging of DSE role creation and modification events in the
dse_security.role_stats
system table. All nodes must have the stats option enabled, and must be restarted for the functionality to take effect.
Authorization options
#authorization_options: # enabled: false # transitional_mode: disabled # allow_row_level_security: false
- authorization_options
- Options for the DSE Authorizer.
- enabled
- Whether to use the DSE Authorizer for role-based access control (RBAC).
- true - use the DSE Authorizer for role-based access control (RBAC)
- false - do not use the Dse Authorizer
Default: commented out (
false
) - transitional_mode
- Allows the DSE Authorizer to operate in a temporary transitional mode
during setup of authorization in a cluster. Set to one of the following
values:
- disabled - Transitional mode is disabled.
- normal - Permissions can be passed to resources, but are not enforced.
- strict - Permissions can be passed to resources, and are enforced on authenticated users. Permissions are not enforced against anonymous users.
Default: commented out (
disabled
) - allow_row_level_security
- Whether to enable row-level access control (RLAC) permissions; use the
same setting on all nodes.
- true - use row-level security
- false - do not use row-level
Default: commented out (
false
)
Kerberos options
kerberos_options: keytab: resources/dse/conf/dse.keytab service_principal: dse/_HOST@REALM http_principal: HTTP/_HOST@REALM qop: auth
- kerberos_options
- Options to configure security for a DataStax Enterprise cluster using Kerberos.
- keytab
- The file path of dse.keytab.
- service_principal
- The service_principal that the DataStax Enterprise process runs under must use the
form dse_user/_HOST@REALM, where:
- dse_user is the name of the user that starts the DataStax Enterprise process.
- _HOST is converted to a reverse DNS lookup of the broadcast address.
- REALM is the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase.
- http_principal
- The http_principal is used by the Tomcat application container to run DSE Search. The Tomcat web server uses the GSSAPI mechanism (SPNEGO) to negotiate the GSSAPI security mechanism (Kerberos). Set REALM to the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase.
- qop
- A comma-delimited list of Quality of Protection (QOP) values that
clients and servers can use for each connection. The client can have multiple QOP
values, while the server can have only a single QOP value. The valid values
are:
- auth - Authentication only.
- auth-int - Authentication plus integrity protection for all transmitted data.
- auth-conf - Authentication plus integrity protection and encryption of all
transmitted data.
Encryption using auth-conf is separate and independent of whether encryption is done using SSL. If both auth-conf and SSL are enabled, the transmitted data is encrypted twice. DataStax recommends choosing only one method and using it for both encryption and authentication.
LDAP options
# ldap_options: # server_host: # server_port: 389 # hostname_verification: false # search_dn: # search_password: # use_ssl: false # use_tls: false # truststore_path: # truststore_password: # truststore_type: jks # user_search_base: # user_search_filter: (uid={0}) # user_memberof_attribute: memberof # group_search_type: directory_search # group_search_base: # group_search_filter: (uniquemember={0}) # group_name_attribute: cn # credentials_validity_in_ms: 0 # search_validity_in_seconds: 0 # connection_pool: # max_active: 8 # max_idle: 8
ldap_options: server_host: win2012ad_server.mycompany.lan server_port: 389 search_dn: cn=lookup_user,cn=users,dc=win2012domain,dc=mycompany,dc=lan search_password: lookup_user_password use_ssl: false use_tls: false truststore_path: truststore_password: truststore_type: jks #group_search_type: directory_search group_search_type: memberof_search #group_search_base: #group_search_filter: group_name_attribute: cn user_search_base: cn=users,dc=win2012domain,dc=mycompany,dc=lan user_search_filter: (sAMAccountName={0}) user_memberof_attribute: memberOf connection_pool: max_active: 8 max_idle: 8
- ldap_options
- Options to configure LDAP security. When not set, LDAP authentication is not
used.
Default: commented out
- server_host
- A comma separated list of LDAP server hosts. Important: Do not use LDAP on the same host (localhost) in production environments. Using LDAP on the same host (localhost) is appropriate only in single node test or development environments.
For information on parameters related to tuning failover performance for multiple LDAP servers, see Tune LDAP failover.
Default: none
- server_port
- The port on which the LDAP server listens.
- 389 - the default port for unencrypted connections
- 636 - typically used for encrypted connections; the default SSL port for LDAP is 636
Default: commented out (
389
) - hostname_verification
- Enable hostname verification. The following conditions must be met:
- Either
use_ssl
oruse_tls
must be set totrue
. - A valid truststore with the correct path specified in
truststore_path
must exist. The truststore must have a certificate entry,trustedCertEntry
, including a SANDNSName
entry that matches the hostname of the LDAP server.
Default: false
- Either
- search_dn
- Distinguished name (DN) of an account with read access to the
user_search_base
andgroup_search_base
. For example:- OpenLDAP:
uid=lookup,ou=users,dc=springsource,dc=com
- Microsoft Active Directory (AD):
cn=lookup, cn=users, dc=springsource, dc=com
Warning: Do not create/use an LDAP account or group calledWhen not set, an anonymous bind is used for the search on the LDAP server.cassandra
. The DSE database comes with a default login role,cassandra
, that has access to all database objects and uses the consistency level QUOROM.Default: commented out
- OpenLDAP:
- search_password
- The password of the
search_dn
account.Default: commented out
- use_ssl
- Whether to use an SSL-encrypted connection.
- true - use an SSL-encrypted connection, set server_port to the LDAP port for the server (typically port 636)
- false - do not enable SSL connections to the LDAP server
Default: commented out (
false
) - use_tls
- Whether to enable TLS connections to the LDAP server.
- true - enable TLS connections to the LDAP server, set server_port to the TLS port of the LDAP server.
- false - do not enable TLS connections to the LDAP server
Default: commented out (
false)
- truststore_path
- The path to the truststore for SSL
certificates.
Default: commented out
- truststore_password
- The password to access the trust store.
Default: commented out
- truststore_type
- The type of truststore.
Default: commented out (
jks
) - user_search_base
- Distinguished name (DN) of the object to start the recursive
search for user entries for authentication and role management memberof searches.
For example to search all users in example.com,
ou=users,dc=example,dc=com
.- For your LDAP domain, set the
ou
anddc
elements. Typically set toou=users,dc=domain,dc=top_level_domain
. For example,ou=users,dc=example,dc=com
. - Active Directory uses a different search base, typically
CN=search,CN=Users,DC=ActDir_domname,DC=internal
. For example,CN=search,CN=Users,DC=example-sales,DC=internal
.
Default: commented out
- For your LDAP domain, set the
- user_search_filter
- Attribute that identifies the user that the search filter uses for looking up user
names.
- uid={0} - when using LDAP
- samAccountName={0} - when using AD (Microsoft Active Directory). For example,
(sAMAccountName={0})
Default: commented out (
uid={0}
)
- user_memberof_attribute
- Attribute that contains a list of group names; role manager assigns DSE roles that
exactly match any group name in the list. Required when managing roles using
group_search_type: memberof_search
with LDAP (role_manager.mode:ldap). The directory server must have memberof support, which is a default user attribute in Microsoft Active Directory (AD).Default: commented out (
memberof
) - group_search_type
- Required when managing roles with LDAP (role_manager.mode: ldap).
Define how group membership is determined for a user. Choose from one of the following
values:
- directory_search - Filters the results by doing a subtree search of group_search_base to find groups that contain the user name in the attribute defined in the group_search_filter. (Default)
- memberof_search - Recursively search for user entries using the
user_search_base
anduser_search_filter
. Get groups from the user attribute defined inuser_memberof_attribute
. The directory server must have memberof support.
Default: commented out (
directory_search
)
- group_search_base
- The unique distinguished name (DN) of the group record
from which to start the group membership search on.
Default: commented out
- group_search_filter
- Set to any valid LDAP filter.
Default: commented out (
uniquemember={0}
) - group_name_attribute
- The attribute in the group record that contains the LDAP group name. Role names are
case-sensitive and must match exactly on DSE for assignment. Unmatched groups are
ignored.
Default: commented out (
cn
) - credentials_validity_in_ms
- The duration period of the credentials cache.
- 0 - disable credentials cache
- duration period in milliseconds - enable a search cache and improve performance by reducing the number of requests that are sent to the internal or LDAP server. See .
Default: commented out (
0
) - search_validity_in_seconds
- The duration period for the search cache.
- 0 - disable search credentials cache
- duration period in seconds - enables a search cache and improves performance by reducing the number of requests that are sent to the internal or LDAP server
Default: commented out (
0
, disabled) - connection_pool
- The configuration settings for the connection pool for making LDAP requests.
- max_active
- The maximum number of active connections to the LDAP
server.
Default: commented out (
8
) - max_idle
- The maximum number of idle connections in the pool
awaiting
requests.
Default: commented out (
8
)
Encrypt sensitive system resources
Options to encrypt sensitive system resources using a local encryption key or a remote KMIP key.
system_info_encryption: enabled: false cipher_algorithm: AES secret_key_strength: 128 chunk_length_kb: 64 key_provider: KmipKeyProviderFactory kmip_host: kmip_host_name
- system_info_encryption
- Options to set encryption settings for system resources that might contain sensitive
information, including the
system.batchlog
andsystem.paxos
tables, hint files, and the database commit log. - enabled
- Whether to enable encryption of system resources. See .Note: TheDefault:
system_trace
keyspace is NOT encrypted by enabling thesystem_information_encryption
section. In environments that also have tracing enabled, manually configure encryption with compression on thesystem_trace
keyspace. See .false
- cipher_algorithm
- The name of the JCE cipher algorithm used to encrypt system resources.
Default: AESTable 1. Supported cipher algorithms names cipher_algorithm secret_key_strength AES 128, 192, or 256 DES 56 DESede 112 or 168 Blowfish 32-448 RC2 40-128 - secret_key_strength
- Length of key to use for the system resources. See Supported cipher algorithms
names.Note: DSE uses a matching local key or requests the key type from the KMIP server. For KMIP, if an existing key does not match, the KMIP server automatically generates a new key.Default:
128
- chunk_length_kb
- Optional. Size of SSTable chunks when data from the system.batchlog or system.paxos
are written to disk. Note: To encrypt existing data, runDefault:
nodetool upgradesstables -a system batchlog paxos
on all nodes in the cluster.64
- key_provider
- KMIP key provider to enable encrypting sensitive system data with a KMIP key.
Comment out if using a local encryption key.
Default: commented out (
KmipKeyProviderFactory
) - kmip_host
- The KMIP key server host. Set to the kmip_group_name that defines
the KMIP host in kmip_hosts section.
DSE requests a key from the KMIP host and uses the key generated by the KMIP provider.
Default: commented out
Encrypted configuration properties settings
system_key_directory: /etc/dse/conf config_encryption_active: false config_encryption_key_name: (key_filename | KMIP_key_URL )
- system_key_directory
- Path to the directory where local encryption/decryption key files are stored, also
called system keys. Distribute the system keys to all nodes in the cluster. Ensure
that the DSE account is the folder owner and has read/write/execute (700) permissions.
See .Note: This directory is not used for KMIP keys.
Default:
/etc/dse/conf
- config_encryption_active
- Whether to enable encryption on sensitive data stored in tables and in configuration
files.
- true - enable encryption of configuration property values using the specified
config_encryption_key_name. When set to true, the
configuration values must be encrypted or commented out. See .Restriction: Lifecycle Manager (LCM) is not compatible when
config_encryption_active
istrue
in DSE and OpsCenter. For LCM limitations, see . - false - Do not enable encryption of configuration property values.
Default:
false
- true - enable encryption of configuration property values using the specified
config_encryption_key_name. When set to true, the
configuration values must be encrypted or commented out. See .
- config_encryption_key_name
- Set to the local encryption key filename or KMIP key URL to use for configuration
file property value decryption. Note: UseDefault: system_key. The default name is not configurable.
dsetool dsetool encryptconfigvalue
to generate encrypted values for the configuration file properties.
KMIP encryption options
kmip_hosts: your_kmip_groupname: hosts: kmip1.yourdomain.com, kmip2.yourdomain.com keystore_path: pathto/kmip/keystore.jks keystore_type: jks keystore_password: password truststore_path: pathto/kmip/truststore.jks truststore_type: jks truststore_password: password key_cache_millis: 300000 timeout: 1000 protocol: protocol cipher_suites: supported_cipher
- kmip_hosts
- Connection settings for key servers that support the KMIP protocol.
- kmip_groupname
- A user-defined name for a group of options to configure a KMIP server or servers, key settings, and certificates. Configure options for a kmip_groupname section for each KMIP key server or group of KMIP key servers. Using separate key server configuration settings allows use of different key servers to encrypt table data, and eliminates the need to enter key server configuration information in DDL statements and other configurations. Multiple KMIP hosts are supported.
- hosts
- A comma-separated list KMIP hosts
(host[:port]) using the FQDN (Fully
Qualified Domain Name). DSE queries the host in the listed order, so add KMIP hosts
in the intended failover sequence.
For example, if the host list contains
kmip1.yourdomain.com, kmip2.yourdomain.com
, DSE trieskmip1.yourdomain.com
and thenkmip2.yourdomain.com
. - keystore_path
- The path to a Java keystore created from the KMIP agent PEM files.
- keystore_type
- The type of keystore.
Default: commented out (
jks
) - keystore_password
- The password to access the
keystore.
Default: commented out (
password
) - truststore_path
- The path to a Java truststore that was created using the KMIP root certificate.
Default: commented out (
/etc/dse/conf/KMIP_truststore.jks
) - truststore_type
- The type of truststore.
Default: commented out (
jks
) - truststore_password
- The password to access the
truststore.
Default: commented out (
password
) - key_cache_millis
- Milliseconds to locally cache the encryption keys that are read from the KMIP hosts.
The longer the encryption keys are cached, the fewer requests are made to the KMIP key
server, but the longer it takes for changes, like revocation, to propagate to the
DataStax Enterprise node. DataStax Enterprise uses concurrent encryption, so multiple
threads fetch the secret key from the KMIP key server at the same time. DataStax
recommends using the default value.
Default: commented out (
300000
) - timeout
- Socket timeout in milliseconds.
Default: commented out (
1000
) - protocol
protocol
When not specified, JVM default is used. Example: TLSv1.2
- cipher_suites
- When not specified, JVM default is used. Examples:
- TLS_RSA_WITH_AES_128_CBC_SHA
- TLS_RSA_WITH_AES_256_CBC_SHA
- TLS_DHE_RSA_WITH_AES_128_CBC_SHA
- TLS_DHE_RSA_WITH_AES_256_CBC_SHA
- TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
- TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
DSE Search index encryption settings
# solr_encryption_options: # decryption_cache_offheap_allocation: true # decryption_cache_size_in_mb: 256
- solr_encryption_options
- Settings to tune encryption of search indexes.
- decryption_cache_offheap_allocation
- Whether to allocate shared DSE Search decryption cache off JVM heap.
- true - allocate shared DSE Search decryption cache off JVM heap
- false - do not allocate shared DSE Search decryption cache off JVM heap
Default: commented out (
true
) - decryption_cache_size_in_mb
- The maximum size of shared DSE Search decryption cache in megabytes (MB).
Default: commented out (
256
)
DSE In-Memory options
To use the DSE In-Memory, choose one of these options to specify how much system memory to use for all in-memory tables: fraction or size.
# max_memory_to_lock_fraction: 0.20 # max_memory_to_lock_mb: 10240
- max_memory_to_lock_fraction
- A fraction of the system memory. The default value of 0.20 specifies to use up to 20%
of system memory. This max_memory_to_lock_fraction value is ignored if
max_memory_to_lock_mb is set to a non-zero value. To specify a fraction, use instead of
max_memory_to_lock_mb.
Default: commented out (
0.20
) - max_memory_to_lock_mb
- A maximum amount of memory in megabytes (MB).
- not set - use the fraction specified with max_memory_to_lock_fraction
- number greater than 0 - maximum amount of memory in megabytes (MB)
Default: commented out (
10240
)
Node health options
node_health_options: refresh_rate_ms: 50000 uptime_ramp_up_period_seconds: 10800 dropped_mutation_window_minutes: 30
- node_health_options
- Node health options are always enabled.
- refresh_rate_ms
- Default: 60000
- uptime_ramp_up_period_seconds
- The amount of continuous uptime required for the node's uptime score to advance the
node health score from 0 to 1 (full health),
assuming there are no recent dropped mutations. The health score is a composite score
based on dropped mutations and uptime. Tip: If a node is repairing after a period of downtime, you might want to increase the uptime period to the expected repair time.
Default: commented out (
10800
3 hours) - dropped_mutation_window_minutes
- The historic time window over which the rate of dropped mutations affect the node
health score.
Default:
30
Health-based routing
enable_health_based_routing: true
- enable_health_based_routing
- Whether to consider node health for replication selection for distributed DSE Search
queries. Health-based routing enables a trade-off between index consistency and query
throughput.
- true - consider node health when multiple candidates exist for a particular token range.
- false - ignore node health for replication selection. When the primary concern is performance, do not enable health-based routing.
Default:
true
Lease metrics
lease_metrics_options: enabled:false ttl_seconds: 604800
- lease_metrics_options
- Lease holder statistics help monitor the lease subsystem for automatic management of Job Tracker and Spark Master nodes.
- enabled
- Enables (true) or disables (false) log entries related to lease holders. Most of the
time you do not want to enable logging.
Default:
false
- ttl_seconds
- Defines the time, in milliseconds, to persist the log of lease holder changes. Logging
of lease holder changes is always on, and has a very low overhead.
Default:
604800
DSE Search options
Scheduler settings for DSE Search indexes
To ensure that records with TTLs are purged from search indexes when they expire, the search indexes are periodically checked for expired documents.ttl_index_rebuild_options: fixed_rate_period: 300 initial_delay: 20 max_docs_per_batch: 4096 thread_pool_size: 1
- ttl_index_rebuild_options
- Section of options to control the schedulers in charge of querying for and removing expired records, and the execution of the checks.
- fix_rate_period
- Time interval to check for expired data in seconds.
Default:
300
- initial_delay
- The number of seconds to delay the first TTL check to speed up start-up time.
Default:
20
- max_docs_per_batch
- The maximum number of documents to check and delete per batch by the TTL rebuild
thread. All documents determined to be expired are deleted from the index during each
check, to avoid memory pressure, their unique keys are retrieved and deletes issued in
batches.
Default:
4096
- thread_pool_size
- The maximum number of cores that can execute TTL cleanup concurrently. Set the
thread_pool_size to manage system resource consumption and prevent many search cores
from executing simultaneous TTL deletes.
Default:
1
Reindexing of bootstrapped data
async_bootstrap_reindex: false
- async_bootstrap_reindex
- For DSE Search, configure whether to asynchronously reindex bootstrapped data.
Default: false
- If enabled, the node joins the ring immediately after bootstrap and reindexing occurs asynchronously. Do not wait for post-bootstrap reindexing so that the node is not marked down. The dsetool ring command can be used to check the status of the reindexing.
- If disabled, the node joins the ring after reindexing the bootstrapped data.
CQL Solr paging
cql_solr_query_paging: off
- cql_solr_query_paging
- driver - Respects driver paging settings. Specifies to use Solr pagination (cursors) only when the driver uses pagination. Enabled automatically for DSE SearchAnalytics workloads.
- off - Paging is off. Ignore driver paging settings for CQL queries and use normal Solr paging unless:
- The current workload is an analytics workload, including SearchAnalytics. SearchAnalytics nodes always use driver paging settings.
- The cqlsh query parameter paging is set to driver.
Even when
cql_solr_query_paging: off
, paging is dynamically enabled with the"paging":"driver"
parameter in JSON queries.
Default: commented out (
off
)
Solr CQL query option
cql_solr_query_row_timeout: 10000
- cql_solr_query_row_timeout
- The maximum time in milliseconds to wait for each row to be read from the database
during CQL Solr queries.
Default: commented out (
10000
10 seconds)
DSE Search resource upload limit
solr_resource_upload_limit_mb: 10
- solr_resource_upload_limit_mb
- Option to disable or configure the maximum file size of the search index config or
schema. Resource files can be uploaded, but the search index config and schema are
stored internally in the database after upload.
- 0 - disable resource uploading
- upload size - The maximum upload size limit in megabytes (MB) for a DSE Search resource file (search index config or schema).
Default:
10
Shard transport options
shard_transport_options: netty_client_request_timeout: 60000
- shard_transport_options
- Fault tolerance option for inter-node communication between DSE Search nodes.
- netty_client_request_timeout
- Timeout behavior during distributed queries. The internal timeout for all search
queries to prevent long running queries. The client request timeout is the maximum
cumulative time (in milliseconds) that a distributed search request will wait idly
for shard responses.
Default:
60000
(1 minute)
DSE Search indexing settings
# back_pressure_threshold_per_core: 1024 # flush_max_time_per_core: 5 # load_max_time_per_core: 5 # enable_index_disk_failure_policy: false # solr_data_dir: /MyDir # solr_field_cache_enabled: false # ram_buffer_heap_space_in_mb: 1024 # ram_buffer_offheap_space_in_mb: 1024
- back_pressure_threshold_per_core
- The maximum number of queued partitions during search index rebuilding and
reindexing. This maximum number safeguards against excessive heap use by the indexing
queue. If set lower than the number of threads per core (TPC), not all TPC threads can
be actively indexing.
Default: commented out (
1024
) - flush_max_time_per_core
- The maximum time, in minutes, to wait for the flushing of asynchronous index updates
that occurs at DSE Search commit time or at flush time. Expert level knowledge is
required to change this value. Always set the value reasonably high to ensure flushing
completes successfully to fully sync DSE Search indexes with the database data. If the
configured value is exceeded, index updates are only partially committed and the
commit log is not truncated which can undermine data durability.Note: When a timeout occurs, it usually means this node is being overloaded and cannot flush in a timely manner. Live indexing increases the time to flush asynchronous index updates.
Default: commented out (
5
) - load_max_time_per_core
- The maximum time, in minutes, to wait for each DSE Search index to load on startup
or create/reload operations. This advanced option should be changed only if exceptions
happen during search index loading. When not set, the default is 5
minutes.
Default: commented out (
5
) - enable_index_disk_failure_policy
- Whether to apply the configured disk failure policy if IOExceptions occur during
index update operations.
- true - apply the configured Cassandra disk failure policy to index write failures
- false - do not apply the disk failure policy
Default: commented out (
false
) - solr_data_dir
- The directory to store index data. For example:
See Managing the location of DSE Search data.By default, each DSE Search index is saved in solr_data_dir/keyspace_name.table_name, or as specified by thesolr_data_dir: /var/lib/cassandra/solr.data
dse.solr.data.dir
system property.Default: commented out
- solr_field_cache_enabled
- The Apache Lucene® field cache is deprecated. Instead, for fields that are sorted,
faceted, or grouped by, set docValues="true" on the field in the search index schema.
Then reload the search index and reindex. When not set, the default is false.
Default: commented out (
false
) - ram_buffer_heap_space_in_mb
- Global Lucene RAM buffer usage threshold for heap to force segment flush. Setting
too low might induce a state of constant flushing during periods of ongoing write
activity. For NRT, forced segment flushes also de-schedule pending auto-soft commits
to avoid potentially flushing too many small segments. When not set, the default is
1024.
Default: commented out (
1024
) - ram_buffer_offheap_space_in_mb
- Global Lucene RAM buffer usage threshold for offheap to force segment flush. Setting
too low might induce a state of constant flushing during periods of ongoing write
activity. For NRT, forced segment flushes also de-schedule pending auto-soft commits
to avoid potentially flushing too many small segments. When not set, the default is
1024.
Default: commented out (
1024
)
Performance Service options
Global Performance Service options
# performance_core_threads: 4 # performance_max_threads: 32 # performance_queue_capacity: 32000
- performance_core_threads
- Number of background threads used by the performance service under normal conditions. Default: 4
- performance_max_threads
- Maximum number of background threads used by the performance service.
- performance_queue_capacity
- The number of queued tasks in the backlog when the number of performance_max_threads are busy. Default: 32000
Performance Service options
These settings are used by the Performance Service to configure collection of performance metrics on transactional nodes. Performance metrics are stored in the dse_perf keyspace and can be queried with CQL using any CQL-based utility, such as cqlsh or any application using a CQL driver. To temporarily make changes for diagnostics and testing, use the dsetool perf subcommands.
- graph_events
- Graph event
information.
graph_events: ttl_seconds: 600
- ttl_seconds
- The TTL in milliseconds.
Default:
600
- cql_slow_log_options
- Options to configure reporting distributed sub-queries for search (query executions
on individual shards) that take longer than a specified period of time.
# cql_slow_log_options: # enabled: true # threshold: 200.0 # minimum_samples: 100 # ttl_seconds: 259200 # skip_writing_to_db: true # num_slowest_queries: 5
Tip: See Collecting slow queries. - enabled
- Enables (true) or disables (false) log entries for slow queries. When not set, the
default is true.
Default: commented out (
true
) - threshold
- The threshold in milliseconds or as a percentile.
- A value greater than 1 is expressed in time and will log queries that take longer than the specified number of milliseconds.
- A value of 0 to 1 is expressed as a percentile and will log queries that exceed this percentile.
Default: commented out (
200.0
0.2 seconds) - minimum_samples
- The initial number of queries before activating the percentile filter.
Default: commented out (
100
) - ttl_seconds
- Time, in milliseconds, to keep the slow query log entries.
Default: commented out (
259200
) - skip_writing_to_db
- Whether to keep slow queries in-memory only and not write data to database.
- false - write slow queries to the database; the threshold must be >= 2000 ms to prevent a high load on the database
- true - skip writing to database, keep slow queries only in memory
Default: commented out (
true
) - num_slowest_queries
- The number of slow queries to keep in-memory.
Default: commented out (
5
)
- cql_system_info_options
- Options to configure collection of system-wide performance information about a
cluster.
cql_system_info_options: enabled: false refresh_rate_ms: 10000
- enabled
- Whether to collect system-wide performance information about a cluster.
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the statistics.
Default:
10000
(10 seconds)
- resource_level_latency_tracking_options
- Options to configure collection of object I/O performance
statistics.
resource_level_latency_tracking_options: enabled: false refresh_rate_ms: 10000
Tip: See Collecting system level diagnostics. - enabled
- Whether to collect object I/O performance statistics.
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the statistics.
Default:
10000
(10 seconds)
- db_summary_stats_options
- Options to configure collection of summary statistics at the database
level.
db_summary_stats_options: enabled: false refresh_rate_ms: 10000
Tip: See Collecting database summary diagnostics. - enabled
- Whether to collect database summary performance information.
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the statistics.
Default:
10000
(10 seconds)
- cluster_summary_stats_options
- Options to configure collection of statistics at a cluster-wide level.
cluster_summary_stats_options: enabled: false refresh_rate_ms: 10000
Tip: See Collecting cluster summary diagnostics. - enabled
- Whether to collect statistics at a cluster-wide level.
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the statistics.
Default:
10000
(10 seconds) - spark_cluster_info_options
- Options to configure collection of data associated with Spark cluster and Spark
applications.
spark_cluster_info_options: enabled: false refresh_rate_ms: 10000
- enabled
- Whether to collect Spark performance statistics.
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the statistics.
Default:
10000
(10 seconds) - histogram_data_options
- Histogram data for the dropped mutation metrics are stored in the dropped_messages
table in the dse_perf
keyspace.
histogram_data_options: enabled: false refresh_rate_ms: 10000 retention_count: 3
Tip: See Collecting histogram diagnostics. - enabled
-
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the statistics.
Default:
10000
(10 seconds) - retention_count
- Default: 3
- user_level_latency_tracking_options
- User-resource latency tracking settings.
user_level_latency_tracking_options: enabled: false refresh_rate_ms: 10000 top_stats_limit: 100 quantiles: false
Tip: See Collecting user activity diagnostics. - enabled
-
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the statistics.
Default:
10000
(10 seconds) - top_stats_limit
- Limit the number of individual
metrics.
Default:
100
- quantiles
-
Default:
false
DSE Search Performance Service options
solr_slow_sub_query_log_options: enabled: false ttl_seconds: 604800 threshold_ms: 3000 async_writers: 1
solr_update_handler_metrics_options: enabled: false ttl_seconds: 604800 refresh_rate_ms: 60000
solr_request_handler_metrics_options: enabled: false ttl_seconds: 604800 refresh_rate_ms: 60000
solr_index_stats_options: enabled: false ttl_seconds: 604800 refresh_rate_ms: 60000
solr_cache_stats_options: enabled: false ttl_seconds: 604800 refresh_rate_ms: 60000
solr_latency_snapshot_options: enabled: false ttl_seconds: 604800 refresh_rate_ms: 60000
- solr_slow_sub_query_log_options
- See Collecting slow search queries.
- enabled
-
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- ttl_seconds
- The length of the sampling period in milliseconds; the frequency to update the
statistics.
Default:
604800
(about 10 minutes) - async_writers
- The number of server threads dedicated to writing in the log. More than one server
thread might degrade performance.
Default:
1
- threshold_ms
-
Default:
3000
- solr_update_handler_metrics_options
- Options to collect search index direct update handler statistics over time.Tip: See Collecting handler statistics.
- enabled
-
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- ttl_seconds
- The length of the sampling period in milliseconds; the frequency to update the
statistics.
Default:
604800
(about 10 minutes) - refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency to update the
statistics.
Default:
60000
(1 minute)
- solr_index_stats_options
- Options to record search index statistics over time.Tip: See Collecting index statistics.
- enabled
-
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- ttl_seconds
- The length of the sampling period in milliseconds; the frequency to update the
statistics.
Default:
604800
(about 10 minutes) - refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency to update the
statistics.
Default:
60000
(1 minute)
- solr_cache_stats_options
- See Collecting cache statistics.
- enabled
-
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- ttl_seconds
- The length of the sampling period in milliseconds; the frequency to update the
statistics.
Default:
604800
(about 10 minutes) - refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency to update the
statistics.
Default:
60000
(1 minute)
- solr_latency_snapshot_options
- See Collecting Apache Solr performance statistics.
- enabled
-
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- ttl_seconds
- The length of the sampling period in milliseconds; the frequency to update the
statistics.
Default:
604800
(about 10 minutes) - refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency to update the
statistics.
Default:
60000
(1 minute)
Spark Performance Service options
spark_application_info_options: enabled: false refresh_rate_ms: 10000 driver: sink: false connectorSource: false jvmSource: false stateSource: false executor: sink: false connectorSource: false jvmSource: false
- spark_application_info_options
- Statistics options.
- enabled
-
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- refresh_rate_ms
- The length of the sampling period in milliseconds; the frequency
to update the statistics.
Default:
10000
(10 seconds) - driver
- Options to configure collection of metrics at the Spark Driver.
- connectorSource
- Whether to collect Spark Cassandra Connector metrics at the Spark Driver.
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- jvmSource
- Whether to collect JVM heap and garbage collection (GC) metrics from the Spark
Driver.
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- stateSource
- Whether to collect application state metrics at the Spark Driver.
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- executor
- Options to configure collection of metrics at Spark executors.
- sink
- Whether to write metrics collected at Spark executors.
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- connectorSource
- Whether to collect Spark Cassandra Connector metrics at Spark executors.
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
- jvmSource
- Whether to collect JVM heap and GC metrics at Spark executors.
- false - do not collect metrics
- true - enable collection of metrics
Default:
false
DSE Analytics options
Spark resource and encryption options
spark_shared_secret_bit_length: 256 spark_security_enabled: false spark_security_encryption_enabled: false spark_daemon_readiness_assertion_interval: 1000 resource_manager_options: worker_options: cores_total: 0.7 memory_total: 0.6 workpools: - name: alwayson_sql cores: 0.25 memory: 0.25 spark_ui_options: encryption: inherit encryption_options: enabled: false keystore: .keystore keystore_password: cassandra require_client_auth: false truststore: .truststore truststore_password: cassandra # Advanced settings # protocol: TLS # algorithm: SunX509 # store_type: JKS # cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
- The length of a shared secret used to authenticate Spark components and encrypt the connections between them. This value is not the strength of the cipher for encrypting connections. Default: 256
- spark_security_enabled
- In DSE 6.0.8 and later, when DSE authentication is enabled with authentication_options, Spark security is enabled
regardless of this setting.
Enables Spark security based on shared secret infrastructure. Enables mutual authentication and optional encryption between DSE Spark Master and Workers, and of communication channels, except the web UI.
Default: false - spark_security_encryption_enabled
- In DSE 6.0.8 and later, when DSE authentication is enabled with authentication_options, Spark security is enabled
regardless of this setting.
Enables encryption between DSE Spark Master and Workers, and of communication channels, except the web UI. Uses DIGEST-MD5 SASL-based encryption mechanism. Requires
spark_security_enabled: true
.Configure encryption between the Spark processes and DSE with client-to-node encryption in cassandra.yaml.
- spark_daemon_readiness_assertion_interval
- Time interval, in milliseconds, between subsequent retries by the Spark plugin for Spark Master and Worker readiness to start. Default: 1000
- resource_manager_options
- DataStax Enterprise can control the memory and cores offered by particular Spark Workers in semi-automatic fashion. You can define the total amount of physical resources available to Spark Workers, and optionally add named work pools with specific resources dedicated to them.
- worker_options
- If the option is not specified, the default value 0.6 is used. The amount of system resources that are made available to the Spark Worker.
- cores_total
- The number of total system cores available to Spark. If the option is not specified,
the default value 0.7 is used. Note: For DSE 6.0.11 and later, the
SPARK_WORKER_TOTAL_CORES
environment variables takes precedence over this setting.This setting can be the exact number of cores or a decimal of the total system cores. When the value is expressed as a decimal, the available resources are calculated in the following way:
The lowest value that you can assign to Spark Worker cores is 1 core. If the results are lower, no exception is thrown and the values are automatically limited.Spark Worker cores = cores_total * total system cores
Note: Settingcores_total
or a workpool'scores
to 1.0 is a decimal value, meaning 100% of the available cores will be reserved. Settingcores_total
orcores
to 1 (no decimal point) is an explicit value, and one core will be reserved. - memory_total
- The amount of total system memory available to Spark. This setting can be the exact
amount of memory or a decimal of the total system memory. When the value is an
absolute value, you can use standard suffixes like M for megabyte and G for
gigabyte.When the value is expressed as a decimal, the available resources are calculated in the following way:
The lowest values that you can assign to Spark Worker memory is 64 MB. If the results are lower, no exception is thrown and the values are automatically limited.Spark Worker memory = memory_total * (total system memory - memory assigned to DataStax Enterprise)
If the option is not specified, the default value 0.6 is used.Note: For DSE 6.0.11 and later, theSPARK_WORKER_TOTAL_MEMORY
environment variables takes precedence over this setting. - workpools
- Named work pools that can use a portion of the total resources defined under
worker_options
. A default work pool nameddefault
is used if no work pools are defined in this section. If work pools are defined, the resources allocated to the work pools are taken from the total amount, with the remaining resources available to thedefault
work pool. The total amount of resources defined in theworkpools
section must not exceed the resources available to Spark inworker_options
. - name
- The name of the work pool.
- cores
- The number of system cores to use in this work pool expressed as either an absolute
value or a decimal value. This option follows the same rules as
cores_total
. - memory
- The amount of memory to use in this work pool expressed as either an absolute value
or a decimal value. This option follows the same rules as
memory_total
.
- spark_ui_options
- Specify the source for SSL settings for Spark Master and Spark Worker UIs. The spark_ui_options apply only to Spark daemon UIs, and do not apply to user applications even when the user applications are run in cluster mode.
- encryption
- inherit - inherit the SSL settings from the client encryption options.
- custom - use the following encryption_optionsfrom dse.yaml.
- encryption_options
- Set encryption options for HTTPS of Spark Master and Worker UI. The spark_encryption_options are not valid for DSE 5.1 and later.
- enabled
- Whether to enable Spark encryption for Spark client-to-Spark cluster and Spark
internode communication.
Default: false
- keystore
- The keystore for Spark encryption keys.
The relative file path is the base Spark configuration directory that is defined by the SPARK_CONF_DIR environment variable. The default Spark configuration directory is resources/spark/conf.
Default:
resources/dse/conf/.ui-keystore
- keystore_password
- The password to access the key store.
Default:
cassandra
- require_client_auth
- Whether to require truststore for client authentication. When not set, the default
is false.
Default: commented out (
false
) - truststore
- The truststore for Spark encryption keys.
The relative file path is the base Spark configuration directory that is defined by the SPARK_CONF_DIR environment variable. The default Spark configuration directory is resources/spark/conf.
Default: commented out (resources/dse/conf/.ui-truststore) - truststore_password
- The password to access the truststore.
Default: commented out (
cassandra
) - protocol
- Defines the encryption protocol. The TLS protocol must be supported by JVM and
Spark.
Default: commented out (
TLS
) - algorithm
- Defines the key manager algorithm.
Default: commented out (
TLSunX509SunX509S
) - store_type
- Defines the keystore type.
Default: commented out (
JKS
) - cipher_suites
- Defines the cipher suites for Spark encryption:
- TLS_RSA_WITH_AES_128_CBC_SHA
- TLS_RSA_WITH_AES_256_CBC_SHA
- TLS_DHE_RSA_WITH_AES_128_CBC_SHA
- TLS_DHE_RSA_WITH_AES_256_CBC_SHA
- TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
- TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
Starting Spark drivers and executors
spark_process_runner: runner_type: default run_as_runner_options: user_slots: - slot1 - slot2
- spark_process_runner:
- Options to configure how Spark driver and executor processes are created and managed.
- runner_type
-
- default - Use the default runner type.
run_as
- Use therun_as_runner_options
options. See Running Spark processes as separate users.
- run_as_runner_options
- The slot users for separating Spark processes users from the DSE service user. See
Running Spark processes as separate users.
Default: slot1, slot2
AlwaysOn SQL options
Properties to enable and configure AlwaysOn SQL.
# AlwaysOn SQL options # alwayson_sql_options: # enabled: false # thrift_port: 10000 # web_ui_port: 9077 # reserve_port_wait_time_ms: 100 # alwayson_sql_status_check_wait_time_ms: 500 # workpool: alwayson_sql # log_dsefs_dir: /spark/log/alwayson_sql # auth_user: alwayson_sql # runner_max_errors: 10
- alwayson_sql_options
- The AlwaysOn SQL options enable and configure the server on this node.
- enabled
- Whether to enable AlwaysOn SQL for this node. The node must be an analytics node.
When not set, the default is
false.
Default: commented out (
false
) - thrift_port
- The Thrift port on which AlwaysOn SQL listens.
Default: commented out (
10000
) - web_ui_port
- The port on which the AlwaysOn SQL web UI is available.
- reserve_port_wait_time_ms
- The wait time in milliseconds to reserve the
thrift_port
if it is not available.Default: commented out (
100
) - alwayson_sql_status_check_wait_time_ms
- The time in milliseconds to wait for a health check status of the AlwaysOn SQL
server.
Default: commented out (
500
) - workpool
- The work pool name used by
AlwaysOn SQL.
Default: commented out (
alwayson_sql
) - log_dsefs_dir
- Location in DSEFS of the AlwaysOn SQL log files.
Default: commented out (
/spark/log/alwayson_sql
) - auth_user
- The role to use for internal communication by AlwaysOn SQL if authentication is
enabled. Custom roles must be created with
login=true
.Default: commented out (
alwayson_sql
) - runner_max_errors
- The maximum number of errors that can occur during AlwaysOn SQL service runner
thread runs before stopping the service. A service stop requires a manual
restart.
Default: commented out (
10
)
DSE File System (DSEFS) options
dsefs_options:
enabled:
keyspace_name: dsefs
work_dir: /var/lib/dsefs
public_port: 5598
private_port: 5599
data_directories:
- dir: /var/lib/dsefs/data
storage_weight: 1.0
min_free_space: 5368709120
# service_startup_timeout_ms: 30000
# service_close_timeout_ms: 600000
# server_close_timeout_ms: 2147483647 # Integer.MAX_VALUE
# compression_frame_max_size: 1048576
# query_cache_size: 2048
# query_cache_expire_after_ms: 2000
# gossip_options:
# round_delay_ms: 2000
# startup_delay_ms: 5000
# shutdown_delay_ms: 10000
# rest_options:
# request_timeout_ms: 330000
# connection_open_timeout_ms: 55000
# client_close_timeout_ms: 60000
# server_request_timeout_ms: 300000
# idle_connection_timeout_ms: 60000
# internode_idle_connection_timeout_ms: 120000
# core_max_concurrent_connections_per_host: 8
# transaction_options:
# transaction_timeout_ms: 3000
# conflict_retry_delay_ms: 200
# conflict_retry_count: 40
# execution_retry_delay_ms: 1000
# execution_retry_count: 3
# block_allocator_options:
# overflow_margin_mb: 1024
# overflow_factor: 1.05
- dsefs_options
- Enable and configure options for DSEFS.
- enabled
- Whether to enable DSEFS.
- true - enables DSEFS on this node, regardless of the workload.
- false - disables DSEFS on this node, regardless of the workload.
- blank or commented out (#) - DSEFS will start only if the node is configured to run analytics workloads.
Default: commented out (blank)
- keyspace_name
- The keyspace where the DSEFS metadata is stored. You can optionally configure
multiple DSEFS file systems within a single datacenter by specifying different
keyspace names for each cluster.
Default: commented out (
dsefs
) - work_dir
- The local directory for storing the local node metadata, including the node identifier. The volume of data stored in this directory is nominal and does not require configuration for throughput, latency, or capacity. This directory must not be shared by DSEFS nodes.
- public_port
- The public port on which DSEFS listens for clients. Note: DataStax recommends that all nodes in the cluster have the same value. Firewalls must open this port to trusted clients. The service on this port is bound to the native_transport_address.
Default: commented out (
5598
) - private_port
- The private port for DSEFS inter-node communication. CAUTION: Do not open this port to firewalls; this private port must be not visible from outside of the cluster.
Default: commented out (
5599
) - data_directories
- One or more data locations where the DSEFS data is stored.
- - dir
- Mandatory attribute to identify the set of directories. DataStax recommends
segregating these data directories on physical devices that are different from the
devices that are used for DataStax Enterprise. Using multiple directories on JBOD
improves performance and capacity.
Default: commented out (
/var/lib/dsefs/data
) - storage_weight
- The weighting factor for this location specifies how much data to place in this
directory, relative to other directories in the cluster. This soft constraint
determines how DSEFS distributes the data. For example, a directory with a value of
3.0 receives about three times more data than a directory with a value of 1.0.
Default: commented out (
1.0
) - min_free_space
- The reserved space, in bytes, to not use for storing file data blocks. You can use a
unit of measure suffix to specify other size units. For example: terabyte (1 TB),
gigabyte (10 GB), and megabyte (5000 MB).
Default: commented out (
5368709120
)
- service_startup_timeout_ms
- Wait time, in milliseconds, before the DSEFS server times out while waiting for
services to bootstrap.
Default: commented out (
30000
) - service_close_timeout_ms
- Wait time, in milliseconds, before the DSEFS server times out while waiting for
services to close.
Default: commented out (
600000
) - server_close_timeout_ms
- Wait time, in milliseconds, that the DSEFS server waits during shutdown before
closing all pending
connections.
Default: commented out (
2147483647
) - compression_frame_max_size
- The maximum accepted size of a compression frame defined during file upload.
Default: commented out (
1048576
) - query_cache_size
- Maximum number of elements in a single DSEFS Server query cache.
Default: commented out (
2048
) - query_cache_expire_after_ms
- The time to retain the DSEFS Server query cache element in cache. The cache element
expires when this time is exceeded.
Default: commented out (
2000
) - gossip options
- Options to configure DSEFS gossip rounds.
- round_delay_ms
- The delay, in milliseconds, between gossip rounds.
Default: commented out (
2000
) - startup_delay_ms
- The delay time, in milliseconds, between registering the location and reading back
all other locations from the database.
Default: commented out (
5000
) - shutdown_delay_ms
- The delay time, in milliseconds, between announcing shutdown and shutting down the
node.
Default: commented out (
30000
) - rest_options
- Options to configure DSEFS rest times.
- request_timeout_ms
- The time, in milliseconds, that the client waits for a response that corresponds to
a given request.
Default: commented out (
330000
) - connection_open_timeout_ms
- The time, in milliseconds, that the client waits to establish a new connection.
Default: commented out (
55000
) - client_close_timeout_ms
- The time, in milliseconds, that the client waits for pending transfer to complete
before closing a connection.
Default: commented out (
60000
) - server_request_timeout_ms
- The time, in milliseconds, to wait for the server rest call to complete.
Default: commented out (
300000
) - idle_connection_timeout_ms
- The time, in milliseconds, for RestClient to wait before closing an idle connection.
If RestClient does not close connection after timeout, the connection is closed after 2*idle_connection_timeout_ms.
- time - wait time to close idle connection
- 0 - disable closing idle connections
Default: commented out (
60000
) - internode_idle_connection_timeout_ms
- Wait time, in milliseconds, before closing idle internode connection. The internode
connections are primarily used to exchange data during replication. Do not set lower
than the default value for heavily utilized DSEFS
clusters.
Default: commented out (
0
) (disabled) - core_max_concurrent_connections_per_host
- Maximum number of connections to a given host per single CPU core. DSEFS keeps a
connection pool for each CPU core.
Default: 8
- transaction_options
- Options to configure DSEFS transaction times.
- transaction_timeout_ms
- Transaction run time, in milliseconds, before the transaction is considered for
timeout and rollback.
Default:
3000
- conflict_retry_delay_ms
- Wait time, in milliseconds, before retrying a transaction that was ended due to a conflict. Default: 200
- conflict_retry_count
- The number of times to retry a transaction before giving up. Default: 40
- execution_retry_delay_ms
- Wait time, in milliseconds, before retrying a failed transaction payload execution. Default: 1000
- execution_retry_count
- The number of payload execution retries before signaling the error to the application. Default: 3
- block_allocator_options
- Controls how much additional data can be placed on the local coordinator before the
local node overflows to the other nodes. The trade-off is between data locality of
writes and balancing the cluster. A local node is preferred for a new block
allocation,
if:
used_size_on_the_local_node < average_used_size_per_node * overflow_factor + overflow_margin
- overflow_margin_mb
-
- margin_size - overflow margin size in megabytes
- 0 - disable block allocation overflow
Default: commented out (
1024
) - overflow_factor
-
- factor - overflow factor on an exponential scale
- 1.0 - disable block allocation overflow
Default: commented out (
1.05
)
DSE Metrics Collector options
Uncomment these options only to change the default directories:
# insights_options: # data_dir: /var/lib/cassandra/insights_data # log_dir: /var/log/cassandra/
- insights_options
- Options for DSE Metrics Collector.
- data_dir
- Directory to store collected metrics. When not set, the default directory is
/var/lib/cassandra/insights_data. Note: When data_dir is not set, the default location of the /insights_data directory is the same location as the /commitlog directory, as defined with the
commitlog_directory
property in cassandra.yaml. - log_dir
- Directory to store logs for collected metrics. The log file is dse-collectd.log. The file with the collectd PID is dse-collectd.pid. When not set, the default directory is /var/log/cassandra/.
Audit database activities
- audit_logging_options
- Options to enable and configure database activity logging.
- enabled
- Whether to enable database activity auditing.
- true - enables database activity auditing
- false - disables database activity auditing
Default:
false
- logger
- The logger to use for recording events:
- SLF4JAuditWriter - Capture events in a log file.
- CassandraAuditWriter - Capture events in a table,
dse_audit.audit_log
.
Tip: Configure logging level, sensitive data masking, and log file name/location in the logback.xml file.Default:
SLF4JAuditWriter
- included_categories
- Comma separated list of event categories that are captured, where the category
names are:
- QUERY - Data retrieval events.
- DML - (Data manipulation language) Data change events.
- DDL - (Data definition language) Database schema change events.
- DCL - (Data change language) Role and permission management events.
- AUTH - (Authentication) Login and authorization related events.
- ERROR - Failed requests.
- UNKNOWN - Events where the category and type are both
UNKNOWN
.
Warning: Use eitherincluded_categories
orexcluded_categories
but not both. When specifying included categories leave excluded_categories blank or commented out.Default: none (include all categories)
- excluded_categories
- Comma separated list of categories to ignore, where the categories are:
- QUERY - Data retrieval events.
- DML - (Data manipulation language) Data change events.
- DDL - (Data definition language) Database schema change events.
- DCL - (Data change language) Role and permission management events.
- AUTH - (Authentication) Login and authorization related events.
- ERROR - Failed requests.
- UNKNOWN - Events where the category and type are both
UNKNOWN
.
Warning: Use eitherincluded_categories
orexcluded_categories
but not both. When specifying excluded categories leave included_categories blank or commented out.Default: none (exclude no categories )
- included_keyspaces
- The keyspaces for which events are logged. Specify keyspace names in a comma
separated list or use a regular expression to filter on keyspace name.Warning: DSE supports using either
included_keyspaces
orexcluded_keyspaces
but not both. When specifying included categories leave excluded_keyspaces blank or comment it out.Default: none (include all keyspaces)
- excluded_keyspaces
- Log events for all keyspaces which are not listed. Specify a comma separated list
keyspace names or use a regular expression to filter on keyspace name. Only use this
option if
included_keyspaces
is blank or commented out.Default: none (exclude no keyspaces)
- included_roles
- The roles for which events are logged. Log events for the listed roles. Specify
roles in a comma separated list.Warning: DSE supports using either
included_roles
orexcluded_roles
but not both. When specifying included_roles leave excluded_keyspaces blank or comment it out.Default: none (include all roles)
- excluded_roles
- The roles for which events are not logged. Specify a comma separated list role
names. Only use this option if
included_roles
is blank or commented out.Default: none (exclude no roles)
Cassandra audit writer options
retention_time: 0 cassandra_audit_writer_options: mode: sync batch_size: 50 flush_time: 250 queue_size: 30000 write_consistency: QUORUM # dropped_event_log: /var/log/cassandra/dropped_audit_events.log # day_partition_millis: 3600000
- retention_time
- The amount of time, in hours, audit events are retained by supporting loggers. Only
the CassandraAuditWriter supports retention time.
- 0 - retain events forever
- hours - the number of hours to retain audit events
Default:
0
(retain events forever) - cassandra_audit_writer_options
- Audit writer options.
- mode
- The mode the writer runs in.
- sync - A query is not executed until the audit event is successfully written.
- async - Audit events are queued for writing to the audit table, but are not
necessarily logged before the query executes. A pool of writer threads consumes
the audit events from the queue, and writes them to the audit table in batch
queries. Important: While async substantially improves performance under load, if there is a failure between when a query is executed, and its audit event is written to the table, the audit table might be missing entries for queries that were executed.
Default:
sync
- batch_size
- Available only when mode: async. Must be greater than 0.
The maximum number of events the writer dequeues before writing them out to the table. If warnings in the logs reveal that batches are too large, decrease this value or increase the value of batch_size_warn_threshold_in_kb in cassandra.yaml.
Default:
50
- flush_time
- Available only when mode: async.
The maximum amount of time in milliseconds before an event is removed from the queue by a writer before being written out. This flush time prevents events from waiting too long before being written to the table when there are not a lot of queries happening.
Default:
500
- queue_size
- The size of the queue feeding the asynchronous audit log writer threads. When there
are more events being produced than the writers can write out, the queue fills up, and
newer queries are blocked until there is space on the queue. If a value of 0 is used,
the queue size is unbounded, which can lead to resource exhaustion under heavy query
load.
Default:
30000
- write_consistency
- The consistency level that is used to write audit events.
Default:
QUORUM
- dropped_event_log
- The directory to store the log file that reports dropped events. When not set, the
default is
/var/log/cassandra/dropped_audit_events.log
.Default: commented out (
/var/log/cassandra/dropped_audit_events.log
) - day_partition_millis
- The interval, in milliseconds, between changing nodes to spread audit log
information across multiple nodes. For example, to change the target node every 12
hours, specify 43200000 milliseconds. When not set, the default is 3600000 (1
hour).
Default: commented out (
3600000
) (1 hour)
DSE Tiered Storage options
# tiered_storage_options: # strategy1: # tiers: # - paths: # - /mnt1 # - /mnt2 # - paths: [ /mnt3, /mnt4 ] # - paths: [ /mnt5, /mnt6 ] # # local_options: # k1: v1 # k2: v2 # # 'another strategy': # tiers: [ paths: [ /mnt1 ] ]
- tiered_storage_options
- Options to configure the smart movement of data across different types of storage media so that data is matched to the most suitable drive type, according to the performance and cost characteristics it requires
- strategy1
- The first disk configuration strategy. Create a strategy2, strategy3, and so on. In this example, strategy1 is the configurable name of the tiered storage configuration strategy.
- tiers
- The unnamed tiers in this section define a storage tier with the paths and file paths that define the priority order.
- local_options
- Local configuration options overwrite the tiered storage settings for the table schema in the local dse.yaml file. See Testing DSE Tiered Storage configurations.
- - paths
- The section of file paths that define the data directories for this tier of the disk configuration. Typically list the fastest storage media first. These paths are used only to store data that is configured to use tiered storage. These paths are independent of any settings in the cassandra.yaml file.
- - /filepath
- The file paths that define the data directories for this tier of the disk configuration.
DSE Advanced Replication configuration settings
# advanced_replication_options: # enabled: false # conf_driver_password_encryption_enabled: false # advanced_replication_directory: /var/lib/cassandra/advrep # security_base_path: /base/path/to/advrep/security/files/
- advanced_replication_options
- Options to enable and configure DSE Advanced Replication.
- enabled
- Whether to enable an edge node to collect data in the replication log.
Default: commented out (
false
) - conf_driver_password_encryption_enabled
- Whether to enable encryption of driver passwords. When enabled, the stored driver
password is expected to be encrypted. See .
Default: commented out (
false
) - advanced_replication_directory
- The directory for storing advanced replication CDC logs. A directory
replication_logs will be created in the specified
directory.
Default: commented out (/var/lib/cassandra/advrep)
- security_base_path
- The base path to prepend to paths in the Advanced Replication configuration locations,
including locations to SSL keystore, SSL truststore, and so on.
Default: commented out (/base/path/to/advrep/security/files/)
Inter-node messaging options
internode_messaging_options: port: 8609 # frame_length_in_mb: 256 # server_acceptor_threads: 8 # server_worker_threads: 16 # client_max_connections: 100 # client_worker_threads: 16 # handshake_timeout_seconds: 10 # client_request_timeout_seconds: 60
- internode_messaging_options
- Configuration options for inter-node messaging.
- port
- The mandatory port for the inter-node messaging service.
Default:
8609
- frame_length_in_mb
- Maximum message frame length. When not set, the default is
256.
Default: commented out (
256
)
- server_acceptor_threads
- The number of server acceptor threads. When not set, the default is the number of
available processors.
Default: commented out
- server_worker_threads
- The number of server worker threads. When not set, the default is the number of
available processors * 8.
Default: commented out
- client_max_connections
- The maximum number of client connections. When not set, the default is
100.
Default: commented out (
100
)
- client_worker_threads
- The number of client worker threads. When not set, the default is the number of
available processors * 8.
Default: commented out
- handshake_timeout_seconds
- Timeout for communication handshake process. When not set, the default is
10.
Default: commented out (
10
) - client_request_timeout_seconds
- Timeout for non-query search requests like core creation and distributed deletes.
When not set, the default is
60.
Default: commented out (
60
)
DSE Multi-Instance server_id
- server_id
- In DSE Multi-Instance /etc/dse-nodeId/dse.yaml files, the server_id option is generated to uniquely identify the physical server on which multiple instances are running. The server_id default value is the media access control address (MAC address) of the physical server. You can change server_id when the MAC address is not unique, such as a virtualized server where the host’s physical MAC is cloned.
DSE Graph options
DSE Graph system-level options
These graph options are system-level configuration options and options that are shared between graph instances. Add an option if it is not present in the provided dse.yaml file.# graph: # analytic_evaluation_timeout_in_minutes: 10080 # realtime_evaluation_timeout_in_seconds: 30 # schema_agreement_timeout_in_ms: 10000 # system_evaluation_timeout_in_seconds: 180 # index_cache_size_in_mb: 128 # max_query_queue: 10000 # max_query_threads (no explicit default) # max_query_params: 16
- graph
- These graph options are system-level configuration options and options that are
shared between graph instances.
Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.
- analytic_evaluation_timeout_in_minutes
- Maximum time to wait for an OLAP analytic (Spark) traversal to evaluate. When not
set, the default is 10080 (168
hours).
Default: commented out (
10080
) - realtime_evaluation_timeout_in_seconds
- Maximum time to wait for an OLTP real-time traversal to evaluate. When not set, the
default is 30
seconds.
Default: commented out (
30
) - schema_agreement_timeout_in_ms
- Maximum time to wait for the database to agree on schema versions before timing out.
When not set, the default is 10000 (10
seconds).
Default: commented out (
10000
) - system_evaluation_timeout_in_seconds
- Maximum time to wait for a graph system-based request to execute, like creating a
new graph. When not set, the default is 180 (3
minutes).
Default: commented out (
180
) - schema_mode
- Controls the way that the schemas are handled.
- Production = Schema must be created before data insertion. Schema cannot be changed after data is inserted. Full graph scans are disallowed unless the option graph.allow_scan is changed to TRUE.
- Development = No schema is required to write data to a graph. Schema can be changed after data is inserted. Full graph scans are allowed unless the option graph.allow_scan is changed to FALSE.
Default: not present
- index_cache_size_in_mb
- The amount of ram to allocate to the index cache. When not set, the default is
128.
Default: commented out (
128
) - max_query_queue
- The maximum number of CQL queries that can be queued as a result of Gremlin
requests. Incoming queries are rejected if the queue size exceeds this setting. When
not set, the default is
10000.
Default: commented out (
10000
) - max_query_threads
- The maximum number of threads to use for queries to the database. When this option
is not set, the default is calculated:
- If gremlinPool is present and nonzero:
10 * the gremlinPool setting
- If gremlinPool is not present in this file or set to zero:
The number of available CPU cores
Default: calculated
- If gremlinPool is present and nonzero:
- max_query_params
- The maximum number of parameters that can be passed on a graph query request for
TinkerPop drivers and drivers using the Cassandra native protocol. Passing very large
numbers of parameters on requests is an anti-pattern, because the script evaluation
time increases proportionally. DataStax recommends reducing the number of parameters
to speed up script compilation times. Before you increase this value, consider
alternate methods for parameterizing scripts, like passing a single map. If the graph
query request requires many arguments, pass a list.
Default: commented out (
16
)
DSE Graph Gremlin Server options
# gremlin_server: # port: 8182 # threadPoolWorker: 2 # gremlinPool: 0 # scriptEngines: # gremlin-groovy: # config: # sandbox_enabled: false # sandbox_rules: # whitelist_packages: # - package.name # whitelist_types: # - fully.qualified.type.name # whitelist_supers: # - fully.qualified.class.name # blacklist_packages: # - package.name # blacklist_supers: # - fully.qualified.class.name
- gremlin_server
- The top-level configurations in Gremlin Server.
- port
- The available communications port for Gremlin Server. When not set, the default is
8182.
Default: commented out (
8182
) - threadPoolWorker
- The number of worker threads that handle non-blocking read and write (requests and
responses) on the Gremlin Server channel, including routing requests to the right
server operations, handling scheduled jobs on the server, and writing serialized
responses back to the client. When not set, the default is 2.
Default: commented out (
2
) - gremlinPool
- The number of Gremlin threads available to execute actual scripts in a
ScriptEngine. This pool represents the workers available to handle blocking
operations in Gremlin Server.
- 0 - the value of the JVM property cassandra.available_processors, if that property is set
- When not set - the value of Runtime.getRuntime().availableProcessors()
Default: commented out (
0
) - scriptEngines
- Section to configure gremlin server scripts.
- gremlin-groovy
- Section for gremlin-groovy scripts.
- sandbox_enabled
- Sandbox is enabled by default. To disable the gremlin groovy sandbox entirely, set to false.
- sandbox_rules
- Section for sandbox rules.
- whitelist_packages
- List of packages, one package per line, to whitelist.
- -package.name
- Retain the hyphen before the fully qualified package name.
- whitelist_types
- List of types, one type per line, to whitelist.
- -fully.qualified.type.name
- Retain the hyphen before the fully qualified type name.
- whitelist_supers
- List of super classes, one class per line, to whitelist. Retain the hyphen before the fully qualified class name.
- -fully.qualified.class.name
- Retain the hyphen before the fully qualified class name.
- blacklist_packages
- List of packages, one package per line, to blacklist.
- -package.name
- Retain the hyphen before the fully qualified package name.
- blacklist_supers
- List of super classes, one class per line, to blacklist. Retain the hyphen before the fully qualified class name.
- -fully.qualified.class.name
- Retain the hyphen before the fully qualified class name.