dse.yaml
configuration file
Where is the spark-env.sh
file?
The default location of the spark-env.sh
file depends on the type of installation:
Installation Type | Location |
---|---|
Package installations + Installer-Services installations |
|
Tarball installations + Installer-No Services installations |
|
Where is the cassandra.yaml
file?
The location of the cassandra.yaml
file depends on the type of installation:
Installation Type | Location |
---|---|
Package installations + Installer-Services installations |
|
Tarball installations + Installer-No Services installations |
|
Where is the dse.yaml
file?
The location of the dse.yaml
file depends on the type of installation:
Installation Type | Location |
---|---|
Package installations + Installer-Services installations |
|
Tarball installations + Installer-No Services installations |
|
The dse.yaml
file is the primary configuration file for security, DSE Search, DSE Graph, and DSE Analytics.
After changing properties in the |
The cassandra.yaml
file is the primary configuration file for the DataStax Enterprise (DSE) database.
Syntax
For the options in each section, the main setting has zero spaces, and at least two spaces are required before each entry in that section.
For example, in the node_health_options
section, at least two spaces are required before refresh_rate_ms
, uptime_ramp_up_period_seconds
, and dropped_mutation_window_minutes
:
node_health_options: refresh_rate_ms: 50000 uptime_ramp_up_period_seconds: 10800 dropped_mutation_window_minutes: 30
Adhere to the YAML syntax. The default values are shown for each section.
Organization
The DataStax Enterprise configuration properties are grouped into the following sections:
Authentication options
Authentication options for the DSE Authenticator, which allows you to use multiple schemes for authentication in a DSE cluster.
Additional configuration is required in the cassandra.yaml
configuration file file.
Internal and LDAP schemes can also used for role management, see |
Default values:
authentication_options: enabled: false default_scheme: kerberos other_schemes: - internal scheme_permissions: true allow_digest_with_kerberos: true plain_text_without_ssl: warn transitional_mode: disabled
authentication_options
-
Options for the DSE Authenticator to authenticate connections. Authenticators other than DSE Authenticator are not supported.
enabled
-
Default:
false
. Enables user authentication. Whenfalse
, the DSE Authenticator allows all connections. default_scheme
-
Sets the first scheme to validate a user against when the driver does not request a specific scheme.
-
internal
- Plain text authentication using the internal password authentication. -
ldap
- Plain text authentication using pass-through LDAP authentication. -
kerberos
- GSSAPI authentication using the Kerberos authenticator. Default.
-
other_schemes
-
List of schemes that are also checked if validation against the first scheme fails and no scheme was specified by the driver. Same scheme names as
default_scheme
.
scheme_permissions
-
Only enable (true) when using multiple schemes for authentication. Prevents unintentional role assignment that might occur if user or group names overlap in the authentication service. When
true
every role requires permissions to a scheme in order to be assigned, see Binding a role to an authentication scheme. allow_digest_with_kerberos
-
Controls whether
DIGEST.adoc5
authentication is also allowed with Kerberos. TheDIGEST.adoc5
mechanism is not directly associated with an authentication scheme, but is used by Kerberos to pass credentials between nodes and jobs. In analytics clusters, set totrue
when using with Spark jobs. plain_text_without_ssl
-
Controls how the
DseAuthenticator
responds to plain text authentication requests over unencrypted client connections. Set to one of the following values:-
block
- Block the request with an authentication error. -
warn
- Log a warning about the request but allow it to continue. Default. -
allow
- Allow the request without any warning.
-
transitional_mode
-
For temporary use during authentication setup in an already established environment. Allows access to the database using the
anonymous
role, which has all permissions exceptAUTHORIZE
.To enable, use one of the following options:
-
permissive
- Allow all connections that provide credentials. Maps authenticated superusers to their role AND maps all other users toanonymous
. -
normal
- Allow all connections that provide credentials. Maps all authenticated users to their role AND maps all other connections toanonymous
. -
strict
- Allow only authenticated connections that map to a login enabled role OR connections that provide a blank username and password asanonymous
.Credentials are required for all connections after authentication is enabled; use a blank username and password to login with anonymous role in transitional mode.
-
When set to disabled
, all connections must provide valid credentials and map to a login enabled role.
Role management options
Default values:
role_management_options:
mode: internal
stats: false
role_management_options
-
Options for the DSE Role Manager. To enable role manager, set
authorization_options
enabled to true androle_manager
incassandra.yaml
tocom.datastax.bdp.cassandra.auth.DseRoleManager
, see Managing roles. Whenscheme_permissions
is enabled, all roles must have permission to execute on the authentication scheme. See Binding a role to an authentication scheme. mode
-
Set to one of the following values:
-
internal
- Scheme that manages roles per individual user in the internal database. Default. -
ldap
- Scheme that assigns roles by looking up the user name in LDAP and mapping the group attribute (ldap_options
) to an internal role name. To configure an LDAP scheme, complete the steps in Defining an LDAP scheme.
-
Nested roles are not supported for LDAP. |
stats
-
Set to true, to enable logging of DSE role creation and modification events in the
dse_security.role_stats
system table. All nodes must have the stats option enabled, and must be restarted for the functionality to take effect.To query role events:
SELECT * FROM dse_security.role_stats; role | created | password_changed -------+---------------------------------+--------------------------------- user1 | 2020-04-13 00:44:09.221000+0000 | null user2 | 2020-04-12 23:49:21.457000+0000 | 2020-04-12 23:49:21.457000+0000 (2 rows)
Authorization options
Default values:
authorization_options: enabled: false transitional_mode: disabled allow_row_level_security: false
authorization_options
-
Options for the DSE Authorizer.
enabled
-
Enables the use of DSE Authorizer for role-based access control (RBAC).
transitional_mode
-
Allows the DSE Authorizer to operate in a temporary transitional mode during setup of authorization in a cluster. Set to one of the following values:
-
disabled
- Transitional mode is disabled. -
normal
- Permissions can be passed to resources, but are not enforced. -
strict
- Permissions can be passed to resources, and are enforced on authenticated users. Permissions are not enforced against anonymous users.
-
allow_row_level_security
-
Default:
false
. True enables row-level access control (RLAC) permissions; use the same setting on all nodes.
Kerberos options
Default values:
kerberos_options:
keytab: path_to_keytab/dse.keytab
service_principal: dse_user/_HOST@REALM
http_principal: HTTP/_HOST@REALM
qop: auth
kerberos_options
-
Configure security for a DataStax Enterprise cluster using Kerberos. See Kerberos guidelines.
keytab
-
The
keytab
file must contain the credentials for both of the fully resolved principal names, which replace_HOST
with the Fully Qualified Domain Name (FQDN) of the host in theservice_principal
andhttp_principal
settings. The UNIX user running DSE must also have read permissions on the keytab. service_principal
-
The
service_principal
that the DataStax Enterprise process runs under must use the formdse_user/_HOST@REALM
.where
dse_user
is:-
Package and Installer-Services installations:cassandra
-
Package installations: the name of the UNIX user that starts the service where:
-
_HOST
is converted to a reverse DNS lookup of the broadcast address. -
REALM
is the name of your Kerberos realm. In the Kerberos principal,REALM
must be uppercase. Theservice_principal
must be consistent everywhere: in the dse.yaml file, present in the keytab, and in the cqlshrc file (whereservice_principal
is separated intoservice/hostname
).
-
-
http_principal
-
The
http_principal
is used by the Tomcat application container to run DSE Search. The Tomcat web server uses GSS-API mechanism (SPNEGO) to negotiate the GSSAPI security mechanism (Kerberos). SetREALM
to the name of your Kerberos realm. In the Kerberos principal,REALM
must be uppercase. qop
-
A comma-delimited list of Quality of Protection (QOP) values that clients and servers can use for each connection. The client can have multiple QOP values, while the server can have only a single QOP value. The valid values are:
-
auth
- Authentication only. Default. -
auth-int
- Authentication plus integrity protection for all transmitted data. -
auth-conf
- Authentication plus integrity protection and encryption of all transmitted data.Encryption using
auth-conf
is separate and independent of whether encryption is done using SSL. If bothauth-conf
and SSL are enabled, the transmitted data is encrypted twice. DataStax recommends choosing only one method and using it for both encryption and authentication.
-
LDAP options
Define LDAP options to authenticate users against an external LDAP service and/or for Role Management using LDAP group look up. See Enabling DSE Unified Authentication.
Default values:
ldap_options:
server_host: localhost ## Appropriate only for development and testing on a single node.
server_port: 389
hostname_verification: false
search_dn: uid=Admin
search_password: secret
use_ssl: false
use_tls: false
truststore_path: path/to/truststore
truststore_password: passwordToTruststore
truststore_type: jks
user_search_base: ou=users,dc=example,dc=com
user_search_filter: (uid={0})
user_memberof_attribute: memberof
group_search_type: directory_search
group_search_base:
group_search_filter: (uniquemember={0})
group_name_attribute: cn
credentials_validity_in_ms: 0
search_validity_in_seconds: 0
connection_pool:
max_active: 8
max_idle: 8
Microsoft Active Directory (AD) example, for both authentication and role management:
ldap_options: server_host: win2012ad_server.mycompany.lan server_port: 389 search_dn: cn=lookup_user,cn=users,dc=win2012domain,dc=mycompany,dc=lan search_password: lookup_user_password use_ssl: false use_tls: false truststore_path: path/to/truststore truststore_password: passwordToTruststore truststore_type: jks user_search_base: cn=users,dc=win2012domain,dc=mycompany,dc=lan user_search_filter: (sAMAccountName={0}) user_memberof_attribute: memberOf group_search_type: directory_search group_search_type: memberof_search group_search_base: group_search_filter: (uniquemember={0}) group_name_attribute: cn credentials_validity_in_ms: 0 search_validity_in_seconds: 0 connection_pool: max_active: 8 max_idle: 8
ldap_options
-
Options to configure LDAP security. See Defining an LDAP scheme.
server_host
-
A comma separated list of LDAP server hosts.
Do not use LDAP on the same host (localhost) in production environments. Using LDAP on the same host (localhost) is appropriate only in single node test or development environments.
For information on parameters related to tuning failover performance for multiple LDAP servers, see Tune LDAP failover.
Default:
none
server_port
-
The port on which the LDAP server listens. Default:
389
hostname_verification
-
Enable hostname verification. The following conditions must be met:
-
Either
use_ssl
oruse_tls
must be set totrue
. -
A valid
truststore
with the correct path specified intruststore_path
must exist. Thetruststore
must have a certificate entry,trustedCertEntry
, including a SANDNSName
entry that matches the hostname of the LDAP server. Default:false
-
search_dn
-
Distinguished name (DN) of an account with read access to the
user_search_base
andgroup_search_base
. Comment out to use an anonymous bind. For example:-
OpenLDAP:
uid=lookup,ou=users,dc=springsource,dc=com
-
Microsoft Active Directory (AD):
cn=lookup, cn=users, dc=springsource, dc=com
Do not create/use an LDAP account or group called
cassandra
. The DSE database comes with a default login rolecassandra
, which has access to all database objects using the consistency levelQUOROM
. -
search_password
-
The password of the
search_dn
account. use_ssl
-
Set to
true
to enable SSL connections to the LDAP server. If set totrue
, changeserver_port
to the SSL port of the LDAP server. Default:false
use_tls
-
Set to
true
to enable TLS connections to the LDAP server. If set totrue
, change theserver_port
to the TLS port of the LDAP server. Default:false
truststore_path
-
The path to the truststore for SSL certificates.
truststore_password
-
The password to access the trust store.
truststore_type
-
The type of truststore. Default:
jks
user_search_base
-
The search base for your domain, used to look up users. Set the
ou
anddc
elements for your LDAP domain. Typically this is set toou=users,dc=domain,dc=top_level_domain
. For example,ou=users,dc=example,dc=com
.Active Directory uses a different search base, typically
CN=search,CN=Users,DC=ActDir_domname,DC=internal
. For example,CN=search,CN=Users,DC=example-sales,DC=internal
. user_search_filter
-
The search filter for looking up user names. Set the LDAP attribute name of the user identifier equal to
{0}
. For example AD (Microsoft Active Directory), is typicallysamAccountName={0}
. Default:uid={0}
user_memberof_attribute
-
The attribute on the user entry that contains group membership information. Required when managing roles using
group_search_type: memberof_search
with LDAP (role_manager.mode:ldap
). group_search_type
-
Required when managing roles with LDAP (
role_manager.mode: ldap
). Defines how group membership is looked up for a user. Choose from one of the following values:-
directory_search
- Filters the results by doing a subtree search ofgroup_search_base
to find groups that contain the user name in the attribute defined in thegroup_search_filter
. (Default) -
memberof_search
- Get groups from the user attribute defined inuser_memberof_attribute
. The directory server must havememberof
support, which is a default user attribute in Microsoft Active Directory (AD).
-
group_search_base
-
The unique distinguished name (DN) of the group record from which to start the group membership search on.
group_search_filter
-
Set to any valid LDAP filter.
Default: (
uniquemember={0}
) group_name_attribute
-
The attribute in the group record that contains the LDAP group name. Role names are case sensitive and must match exactly on DSE for assignment. Default:
cn
credentials_validity_in_ms
-
The duration period of the credentials cache.
-
0 - disable credentials cache
-
duration period in milliseconds - enable a search cache and improve performance by reducing the number of requests that are sent to the internal or LDAP server. See Defining an LDAP scheme. When not set, the default is 0 (disabled).
Default: commented out (
0
) -
search_validity_in_seconds
-
The duration period in seconds for the search cache. Default: 0
connection_pool
-
The configuration settings for the connection pool for making LDAP requests.
-
max_active
- The maximum number of active connections to the LDAP server. Default:8
-
max_idle
- The maximum number of idle connections in the pool awaiting requests. Default:8
-
Encrypt sensitive system resources
The system_info_encryption
section that controls encryption of sensitive system resources using either a local encryption key or remote KMIP key.
DataStax recommends using a remote encryption key from a KMIP provider when using Transparent Data Encryption (TDE) features. Only use a local encryption key if a KMIP server is not available. |
Default values:
system_info_encryption:
enabled: false
cipher_algorithm: AES
secret_key_strength: 128
chunk_length_kb: 64
key_provider: KmipKeyProviderFactory
kmip_host: kmip_host_name
system_info_encryption
-
Controls encryption of sensitive system resources using either a local encryption key or remote KMIP key.
enabled
-
Set to
true
to enable encryption of system resources that might contain sensitive information, including thesystem.batchlog
andsystem.paxos
tables, hint files, and the database commit log. After enabling system resource encryption in an environment that already has data, encrypt the existing SSTables by runningnodetool upgradesstables-a system batchlog paxos
The
system_trace
keyspace is NOT encrypted by enabling thesystem_information_encryption
section. In environments that also have tracing enabled, manually configure encryption with compression on thesystem_trace
keyspace. See Transparent data encryption.Default:
false
.
cipher_algorithm
-
Default:
AES
. The name of the JCE cipher algorithm used to encrypt system resources.cipher_algorithm
secret_key_strength
AES
128, 192, or 256
DES
56
DESede
112 or 168
Blowfish
32-448
RC2
40-128
secret_key_strength
-
Default: 128. Length of key to use for the system resources. See Table 1.
DSE uses a matching local key or request the key type from the KMIP server. For KMIP, if an existing key does not match the KMIP server automatically generates a new key.
chunk_length_kb
-
Default:
64
. Optional. Size of SSTable chunks when data from thesystem.batchlog
orsystem.paxos
are written to disk.To encrypt existing data, run
nodetool upgradesstables
-a system batchlog paxos
on all nodes in the cluster. key_provider
-
Set to
KmipKeyProviderFactory
to encrypt sensitive system data with a KMIP key. Comment out this property if using a local encryption key.Default:
none
kmip_host
-
Set to the
kmip_group_name
that defines the KMIP host inkmip_hosts
section. DSE requests a key from the KMIP host and uses the key generated by the KMIP provider. Default:none
Encrypted configuration properties settings
Settings for using encrypted passwords in sensitive configuration file properties.
system_key_directory: /etc/dse/conf
config_encryption_active: false
config_encryption_key_name: (key_filename | KMIP_key_URL )
system_key_directory
-
Path to the directory where local encryption key files are stored, also called system keys. Distribute the system keys to all nodes in the cluster. Ensure that the DSE account is the folder owner and has read/write/execute (700) permissions. Default:
/etc/dse/conf
This directory is not used for KMIP keys.
config_encryption_active
-
Whether to enable encryption on sensitive data stored in tables and in configuration files.
-
false
- Do not enable encryption of configuration property values. -
true
- enable encryption of configuration property values using the specifiedconfig_encryption_key_name
. When set to true, the configuration values must be encrypted or commented out. See Encrypting configuration file properties.Lifecycle Manager (LCM) is not compatible when
config_encryption_active
istrue
in DSE and OpsCenter. For LCM limitations, see Configuration encryption.When enabled, encrypt values for following properties:
-
dse.yaml
LDAP values:ldap_options.search_password ldap_options.truststore_password
Use plain text for the KMIP keystore or truststore passwords.
-
cassandra.yaml
SSL values:server_encryption_options.keystore_password server_encryption_options.truststore_password client_encryption_options.keystore_password client_encryption_options.truststore_password
dsetool encryptconfigvalue
returns encrypted values using the [config_encryption_key_name
] key. -
config_encryption_key_name
-
Default:
system_key
. The default name is not configurable.Set to the local encryption key filename or KMIP key URL to use for configuration file property value decryption.
Use
dsetool encryptconfigvalue
to generate encrypted values for the configuration file properties.
KMIP encryption options
Options for KMIP encryption keys and communication between the DataStax Enterprise node and the KMIP key server or key servers. Enables DataStax Enterprise encryption features to use encryption keys that stored on a server that is not running DataStax Enterprise.
Default values:
kmip_hosts:
your_kmip_groupname:
hosts: kmip1.yourdomain.com, kmip2.yourdomain.com
keystore_path: pathto/kmip/keystore.jks
keystore_type: jks
keystore_password: password
truststore_path: pathto/kmip/truststore.jks
truststore_type: jks
truststore_password: password
key_cache_millis: 300000
timeout: 1000
protocol: protocol
cipher_suites: supported_cipher
kmip_hosts
-
Connection settings for key servers that support the KMIP protocol.
kmip_groupname
-
A user-defined name for a group of options to configure a KMIP server or servers, key settings, and certificates. Configure options for a
kmip_groupname
section for each KMIP key server or group of KMIP key servers. Using separate key server configuration settings allows use of different key servers to encrypt table data, and eliminates the need to enter key server configuration information in DDL statements and other configurations. Multiple KMIP hosts are supported.Default: commented out
hosts
-
A comma-separated list of KMIP hosts using the Fully Qualified Domain Name (FQDN). DSE queries the host in the listed order.
For example, if the host list contains
kmip1.yourdomain.com, kmip2.yourdomain.com
, DSE trieskmip1.yourdomain.com
and thenkmip2.yourdomain.com
. keystore_path
-
The path to a Java keystore created from the KMIP agent PEM files. For example:
/etc/dse/conf/KMIP_keystore.jks
keystore_type
-
The type of key store. The default value is
jks
. keystore_password
-
The password to access the key store.
truststore_path
-
The path to a Java truststore created using the KMIP root certificate. For example:
/etc/dse/conf/KMIP_truststore.jks
truststore_type
-
The type of truststore. The default value is jks.
truststore_password
-
The password to access the truststore.
key_cache_millis
-
Milliseconds to locally cache the encryption keys that are read from the KMIP hosts. The longer the encryption keys are cached, the fewer requests are made to the KMIP key server, but the longer it takes for changes, like revocation, to propagate to the DataStax Enterprise node. DataStax Enterprise uses concurrent encryption, so multiple threads fetch the secret key from the KMIP key server at the same time. Default:
300000
. DataStax recommends using the default value. timeout
-
Socket timeout in milliseconds. Default:
1000
. protocol
-
protocol
When not specified, JVM default is used. Example:
TLSv1.2
cipher_suites
-
When not specified, JVM default is used. Examples:
-
TLS_RSA_WITH_AES_128_CBC_SHA
-
TLS_RSA_WITH_AES_256_CBC_SHA
-
TLS_DHE_RSA_WITH_AES_128_CBC_SHA
-
TLS_DHE_RSA_WITH_AES_256_CBC_SHA
-
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
-
TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
Seecipher_algorithm
.
-
DSE Search index encryption settings
Default values:
solr_encryption_options:
decryption_cache_offheap_allocation: true
decryption_cache_size_in_mb: 256
solr_encryption_options
-
Specify settings to tune encryption of search indexes.
decryption_cache_offheap_allocation
-
Specify whether to allocate shared DSE Search decryption cache off JVM heap. Default:
true
decryption_cache_size_in_mb
-
Sets the maximum size of shared DSE Search decryption cache, in megabytes (MB). Default:
256
DSE In-Memory options
max_memory_to_lock_mb:
max_memory_to_lock_fraction: 0.20
# max_memory_to_lock_mb: 10240
max_memory_to_lock_mb
-
To use DSE In-Memory, choose one of these options to specify how much system memory to use for all in-memory tables.
-
max_memory_to_lock_fraction
Specify a fraction of the system memory. The default value of 0.20 specifies to use up to 20% of system memory.
-
max_memory_to_lock_mb
Specify a maximum amount of memory in megabytes (MB).
-
Node health options
node_health_options:
refresh_rate_ms: 50000
uptime_ramp_up_period_seconds: 10800
dropped_mutation_window_minutes: 30
node_health_options
-
Node health options are always enabled for all nodes. Node health is a score-based representation of how fit a node is to handle search queries.
- refresh_rate_ms
-
Default:
60000
uptime_ramp_up_period_seconds
-
Default:
10800
(3 hours). The amount of continuous uptime required for the node’s uptime score to advance the node health score from 0 to 1 (full health), assuming there are no recent dropped mutations. The health score is a composite score based on dropped mutations and uptime.
If a node is repairing after a period of downtime, you might want to increase the uptime period to the expected repair time. |
dropped_mutation_window_minutes
-
Default:
30
. The historic time window over which the rate of dropped mutations affect the node health score.
Health-based routing
enable_health_based_routing: true
enable_health_based_routing
-
Default:
true
. Enable replication selection for distributed DSE Search queries to consider node health when multiple candidates exist for a particular token range. Health-based routing enables a trade-off between index consistency and query throughput. When the primary concern is performance, do not enable health-based routing.
Lease metrics
Default values:
lease_metrics_options:
enabled:false
ttl_seconds: 604800
lease_metrics_options
-
Lease holder statistics help monitor the lease subsystem for automatic management of Job Tracker and Spark Master nodes.
enabled
-
Enables (true) or disables (false) log entries related to lease holders. Most of the time you do not want to enable logging. Default: false
ttl_seconds
-
Defines the time, in milliseconds, to persist the log of lease holder changes. Logging of lease holder changes is always on, and has a very low overhead. Default: 604800
Scheduler settings for DSE Search indexes
Default values:
ttl_index_rebuild_options:
fixed_rate_period: 300
initial_delay: 20
max_docs_per_batch: 4096
thread_pool_size: 1
ttl_index_rebuild_options
-
To ensure that records with TTLs are purged from search indexes when they expire, the search indexes are periodically checked for expired documents. The
ttl_index_rebuild_options
settings control the schedulers in charge of querying for and removing expired records, and the execution of the checks. fixed_rate_period
-
Schedules how often to check for expired data in seconds. Default:
300
initial_delay
-
Speeds startup time by delaying the first TTL checks in seconds. Default:
20
max_docs_per_batch
-
Sets the maximum number of documents to check and delete per batch by the TTL rebuild thread. Default:
4096
thread_pool_size
-
To manage system resource consumption and prevent many search cores from executing simultaneous TTL deletes, defines the maximum number of cores that can execute TTL cleanup concurrently. Default:
1
Reindexing of bootstrapped data
async_bootstrap_reindex: false
async_bootstrap_reindex
-
For DSE Search, configure whether to asynchronously reindex bootstrapped data. Default:
false
-
If enabled, the node joins the ring immediately after bootstrap and reindexing occurs asynchronously. Do not wait for post-bootstrap reindexing so that the node is not marked down. The
dsetool ring
command can be used to check the status of the reindexing. -
If disabled, the node joins the ring after reindexing the bootstrapped data.
-
CQL Solr paging
Options to specify the paging behavior.
cql_solr_query_paging: off
cql_solr_query_paging
-
Options to specify the paging behavior.
-
off
- Default. Paging is off. Ignore driver paging settings for CQL Solr queries and use normal Solr paging unless:-
The current workload is an analytics workload, including SearchAnalytics. SearchAnalytics nodes always use driver paging settings.
-
The cqlsh query parameter paging is set to driver.
Even when
cql_solr_query_paging: off
, paging is dynamically enabled with the"paging":"driver"
parameter in JSON queries.
-
-
driver
- Respects driver paging settings. Specifies to use Solr pagination (cursors) only when the driver uses pagination. Enabled automatically for DSE SearchAnalytics workloads.
-
Solr CQL query options
Default value:
cql_solr_query_row_timeout: 10000
cql_solr_query_row_timeout
-
The maximum time in milliseconds to wait for each row to be read from the database during CQL Solr queries. Default:
10000
(10 seconds).
DSE Search resource upload limit
Default value:
solr_resource_upload_limit_mb: 10
solr_resource_upload_limit_mb
-
Default:
10
. You can configure the maximum resource file size or disable resource upload Sets the maximum DSE Search resource upload size limit in megabytes (MB). Set to0
to disable resource uploading.
Shard transport options
This shard transport option for inter-node communication between DSE Search nodes controls timeout behavior during distributed queries.
Default values:
shard_transport_options:
netty_client_request_timeout: 60000
shard_transport_options
-
For inter-node communication between DSE Search nodes.
netty_client_request_timeout
-
Default:
60000
. The client request timeout is the maximum cumulative time (in milliseconds) that a distributed search request waits idly for shard responses. Defines timeout behavior during distributed queries.
DSE Search indexing settings
DSE Search implements multi-threaded indexing to improve performance on multi-core machines. All index updates are internally dispatched to a per-core indexing thread pool and executed asynchronously, which allows for greater concurrency and parallelism. However, index requests can return a response before the indexing operation is executed.
Default values:
max_solr_concurrency_per_core: 2
# enable_back_pressure_adaptive_nrt_commit: true
# back_pressure_threshold_per_core: 2000
# flush_max_time_per_core: 5
# load_max_time_per_core: 5
# enable_index_disk_failure_policy: false
# solr_data_dir: /MyDir
# solr_field_cache_enabled: false
max_solr_concurrency_per_core
-
Configures the maximum number of concurrent asynchronous indexing threads per DSE Search index. Default:
number_of_available_CPU_cores
.If set to
1
, DSE Search reverts to using synchronous indexing behavior, where data is synchronously written to the database in a single thread and indexed for DSE Search.To achieve optimal performance, assign this value to number of available CPU cores divided by the number of search cores. For example, with 16 CPU cores and 4 search cores, the suggested value is 4. Also see Tuning search for maximum indexing throughput.
To prevent writes from overwhelming reads, reduce this value and adjust
parallelDeleteTasks
in the search index config.Dynamic switching to search concurrency level at 1 is disallowed.
enable_back_pressure_adaptive_nrt_commit
-
Allows back pressure system to adapt max auto soft commit time (defined per search index config) to the actual load. Setting is respected only for NRT (near real time) cores. When DSE search cores have real-time (RT) live indexing, adaptive commits are disabled regardless of this property value. See live indexing with RT.
Default:
true
back_pressure_threshold_per_core
-
The total number of queued asynchronous indexing requests per search core. When this number is exceeded, back pressure prevents excessive resource consumption by throttling new incoming requests. DataStax recommends using a back_pressure_threshold_per_core value of 1000 *
max_solr_concurrency_per_core
.Default:
2000
flush_max_time_per_core
-
The maximum time, in minutes, to wait for the flushing of asynchronous index updates, which occurs at DSE Search commit time or at flush time. Expert level knowledge is required to change this value. Always set the value reasonably high to ensure flushing completes successfully to fully sync DSE Search indexes with the database data. If the configured value is exceeded, index updates are only partially committed, and the commit log is not truncated to ensure data durability.
When a timeout occurs, it usually means this node is being overloaded and cannot flush in a timely manner. Live indexing increases the time to flush asynchronous index updates.
Default:
5
load_max_time_per_core
-
The maximum time, in minutes, to wait for each DSE Search index to load on startup or create/reload operations, expressed. This advanced option should be changed only if exceptions happen during core loading.
Default:
5
(if not specified) enable_index_disk_failure_policy
-
DSE Search activates the configured disk failure policy if
IOExceptions
occur during index update operations.Default:
false
solr_data_dir
-
The directory to store index data. For example:
solr_data_dir: /var/lib/cassandra/solr.data
See Managing the location of DSE Search data.By default, each DSE Search index is saved in
solr_data_dir/keyspace_name.table_name
, or as specified by thedse.solr.data.dir
system property.Default: commented out
solr_field_cache_enabled
-
The Apache Lucene® field cache is deprecated. Instead, for fields that are sorted, faceted, or grouped by, set
docValues="true"
on the field in theschema.xml
file. Then reload the core and reindex. The default value is false. To override false, setuseFieldCache=true
in the request.
Global Performance Service options
Available options to configure the thread pool that is used by most plug-ins.
A dropped task warning is issued when the performance service requests more tasks than performance_max_threads
+ performance_queue_capacity
.
When a task is dropped, collected statistics might not be current.
Default values:
performance_core_threads: 4 performance_max_threads: (cassandra.concurrent_writes) performance_queue_capacity: 32000
performance_core_threads
-
Number of background threads used by the performance service under normal conditions. Default:
4
performance_max_threads
-
Maximum number of background threads used by the performance service. Limited to the value of
concurrent_writes
in thecassandra.yaml
file. Default: The number ofcassandra.concurrent_writes
. performance_queue_capacity
-
The number of queued tasks in the backlog when the number of
performance_max_threads
are busy. Default:32000
CQL Performance Service options
These settings are used by the Performance Service to configure collection of performance metrics on transactional nodes.
Performance metrics are stored in the dse_perf keyspace and can be queried with CQL using any CQL-based utility, such as cqlsh or any application using a CQL driver.
To temporarily make changes for diagnostics and testing, use the dsetool perf
subcommands.
Default values:
graph_events: ttl_seconds: 600
cql_slow_log_options: enabled: true threshold: 200.0 minimum_samples: 100 ttl_seconds: 259200 skip_writing_to_db: true num_slowest_queries: 5
cql_system_info_options: enabled: false refresh_rate_ms: 10000
resource_level_latency_tracking_options: enabled: false refresh_rate_ms: 10000
db_summary_stats_options: enabled: false refresh_rate_ms: 10000
cluster_summary_stats_options: enabled: false refresh_rate_ms: 10000
spark_cluster_info_options:
enabled: false
refresh_rate_ms: 10000
histogram_data_options: enabled: false refresh_rate_ms: 10000 retention_count: 3
user_level_latency_tracking_options: enabled: false refresh_rate_ms: 10000 top_stats_limit: 100 quantiles: false
graph_events
-
Graph event information.
ttl_seconds
-
Defines the TTL in milliseconds. Default:
600
cql_slow_log_options
-
Report distributed sub-queries for search (query executions on individual shards) that take longer than a specified period of time. See Collecting slow queries.
enabled
-
Enables (true) or disables (false) log entries for slow queries. Default:
true
threshold
-
Defines the threshold (in milliseconds or as a percentile). Default:
200.0
-
A value greater than 1 is expressed in time and logs queries that take longer than the specified number of milliseconds.
-
A value of 0 to 1 is expressed as a percentile and logs queries that exceed this percentile.
-
minimum_samples
-
Defines the initial number of queries before activating the percentile filter. Default:
100
ttl_seconds
-
Defines the time, in milliseconds, to keep the slow query log entries. Default:
259200
skip_writing_to_db
-
Keeps (true) slow queries in-memory only and does not write data to database. Default:
true
When false, the threshold must be >= 2000 ms to prevent a high load on database.
num_slowest_queries
-
The number of slow queries to keep in-memory. Default: 5
cql_system_info_options
-
CQL system information tables settings See Collecting system level diagnostics.
cql_system_info_options: enabled: false refresh_rate_ms: 10000
enabled
-
Default:
false
refresh_rate_ms
-
Default:
10000
resource_level_latency_tracking_options
-
Data resource latency tracking settings. See Collecting system level diagnostics.
resource_level_latency_tracking_options: enabled: false refresh_rate_ms: 10000
enabled
-
Default:
false
refresh_rate_ms
-
Default:
10000
db_summary_stats_options
-
Database summary statistics settings. See Collecting database summary diagnostics.
db_summary_stats_options: enabled: false refresh_rate_ms: 10000
enabled
-
Default:
false
refresh_rate_ms
-
Default:
10000
cluster_summary_stats_options
-
Cluster summary statistics settings. See Collecting cluster summary diagnostics.
cluster_summary_stats_options: enabled: false refresh_rate_ms: 10000
enabled
-
Default:
false
refresh_rate_ms
-
Default:
10000
spark_cluster_info_options
-
See Monitoring Spark with Spark Performance Objects.
spark_cluster_info_options: enabled: false refresh_rate_ms: 10000
histogram_data_options
-
Histogram data tables settings. See Collecting histogram diagnostics.
enabled
-
When true, the dropped mutation metrics are stored in the
dropped_messages
table in thedse_perf
keyspace. Default:false
refresh_rate_ms
-
Default:
10000
retention_count
-
Default:
3
user_level_latency_tracking_options
-
User-resource latency tracking settings. See Collecting user activity diagnostics.
enabled
-
Default: false
refresh_rate_ms
-
Default: 10000
top_stats_limit
-
Default: 100
quantiles
-
Default: false
DSE Search Performance Service options
These settings are used by the Performance Service. See DSE Performance Service.
Default values:
solr_indexing_error_log_options:
enabled: false
ttl_seconds: 604800
async_writers: 1
solr_slow_sub_query_log_options:
enabled: false
ttl_seconds: 604800
threshold_ms: 3000
async_writers: 1
solr_update_handler_metrics_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000
solr_request_handler_metrics_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000
solr_index_stats_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000
solr_cache_stats_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000
solr_latency_snapshot_options:
enabled: false
ttl_seconds: 604800
refresh_rate_ms: 60000
solr_indexing_error_log_options
-
Enable to collect record errors that occur during document indexing.
enabled
-
Default:
false
ttl_seconds
-
Default:
604800
async_writers
-
Defines the number of server threads dedicated to writing in the log. More than one server thread might degrade performance. Default:
1
solr_slow_sub_query_log_options
enabled
-
Default:
false
ttl_seconds
-
Default:
604800
async_writers
-
Defines the number of server threads dedicated to writing in the log. More than one server thread might degrade performance. Default:
1
threshold_ms
-
Default: 100
solr_update_handler_metrics_options
enabled
-
Determines whether the object is enabled at startup.
ttl_seconds
-
How many seconds a record survives before it is expired from the performance object.
refresh_rate_ms
-
Period (in milliseconds) between sample recordings for periodically updating statistics like the
solr_result_cache_stats
. solr_request_handler_metrics_options
-
Records core-specific direct and request update handler statistics over time.
enabled
-
Default:
false
ttl_seconds
-
Default:
604800
refresh_rate_ms
-
Default:
60000
solr_index_stats_options
enabled
-
Default:
false
ttl_seconds
-
Default:
604800
refresh_rate_ms
-
Default:
60000
solr_cache_stats_options
enabled
-
Default:
false
ttl_seconds
-
Default:
604800
refresh_rate_ms
-
Default:
60000
solr_latency_snapshot_options
enabled
-
Default:
false
ttl_seconds
-
Default:
604800
refresh_rate_ms
-
Default:
60000
Spark Performance Service options
Default values:
spark_application_info_options:
enabled: false
refresh_rate_ms: 10000
driver:
sink: false
connectorSource: false
jvmSource: false
stateSource: false
executor:
sink: false
connectorSource: false
jvmSource: false
spark_application_info_options
-
Statistics options.
enabled
-
Default:
false
refresh_rate_ms
-
Default:
10000
milliseconds driver
-
Enables collection of the metrics by the Spark Driver.
sink
-
Enables writing of the metrics collected from the Spark Driver. Default:
false
connectorSource
-
Enables writing of the Spark Cassandra Connector metrics at the Spark Driver. Default:
false
jvmSource
-
Enables JVM heap and GC metrics at the Spark Driver. Default:
false
stateSource
-
Enables application state metrics. Default:
false
executor
-
Enables collection of the metrics collected at Spark executors. Default:
false
sink
-
Enables writing of the metrics collected at Spark executors. Default:
false
connectorSource
-
Enables writing of the Spark Cassandra Connector metrics at Spark executors. Default:
false
jvmSource
-
Enables JVM heap and GC metrics at Spark executors. Default:
false
Spark memory and Spark encryption options
Default values:
initial_spark_worker_resources: 0.7
spark_shared_secret_bit_length: 256
spark_security_enabled: false
spark_security_encryption_enabled: false
spark_daemon_readiness_assertion_interval: 1000
spark_ui_options:
encryption: inherit
encryption_options:
enabled: false
keystore: .keystore
keystore_password: cassandra
require_client_auth: false
truststore: .truststore
truststore_password: cassandra
# Advanced settings
# protocol: TLS
# algorithm: SunX509
# store_type: JKS
# cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
initial_spark_worker_resources
-
DataStax Enterprise can control the memory and cores offered by particular Spark Workers in semi-automatic fashion. Specify the fraction of system resources that are made available to the Spark Worker.
The available resources are calculated in the following way:
-
Spark Worker memory = initial_spark_worker_resources * (total system memory - memory assigned to DataStax Enterprise)
-
Spark Worker cores = initial_spark_worker_resources * total system cores
The lowest values that you can assign to Spark Worker memory is 64 MB. The lowest value that you can assign to Spark Worker cores is
1
core. If the results are lower, no exception is thrown and the values are automatically limited. The range of theinitial_spark_worker_resources
value is0.01
to1
. If the range is not specified, the default value0.7
is used.This mechanism is used by default to set the Spark Worker memory and cores. To override the default, uncomment and edit one or both
SPARK_WORKER_MEMORY
andSPARK_WORKER_CORES
options in thespark-env.sh
file. -
spark_shared_secret_bit_length
-
The length of a shared secret used to authenticate Spark components and encrypt the connections between them. This value is not the strength of the cipher for encrypting connections. Default:
256
spark_security_enabled
-
In DSE 5.1.15 and later, when DSE authentication is enabled with
authentication_options
, Spark security is enabled regardless of this setting.Enables Spark security based on shared secret infrastructure. Enables mutual authentication and optional encryption between DSE Spark Master and Workers, and of communication channels, except the web UI.
Default:
false
spark_security_encryption_enabled
-
In DSE 5.1.15 and later, when DSE authentication is enabled with
authentication_options
, Spark security is enabled regardless of this setting. UsesDIGEST.adoc5
SASL-based encryption mechanism.Enables encryption of between DSE Spark Master and Workers, and of communication channels, except the web UI. Uses
DIGEST.adoc5
SASL-based encryption mechanism. Requiresspark_security_enabled: true
.Configure encryption between the Spark processes and DSE with client-to-node encryption in
cassandra.yaml
. spark_daemon_readiness_assertion_interval
-
Time interval, in milliseconds, between subsequent retries by the Spark plugin for Spark Master and Worker readiness to start. Default:
1000
spark_ui_options
-
Specify the source for SSL settings for Spark Master and Spark Worker UIs. The
spark_ui_options
apply only to Spark daemon UIs, and do not apply to user applications even when the user applications are run in cluster mode. encryption
-
-
inherit
- inherit the SSL settings from the client encryption options. Default. -
custom
- use the followingencryption_options
.
-
encryption_options
-
Set encryption options for HTTPS of Spark Master and Worker UI. The
spark_encryption_options
are not valid for DSE 5.1 and later. enabled
-
Enable (true) or disable (false) Spark encryption for Spark client-to-Spark cluster and Spark internode communication. Default: false
keystore
-
The keystore for Spark encryption keys. The relative file path is the base Spark configuration directory that is defined by the
SPARK_CONF_DIR
environment variable. The default Spark configuration directory isresources/spark/conf
. Default:.keystore
keystore_password
-
The password to access the key store. Default:
cassandra
truststore
-
The truststore for Spark encryption keys. The relative file path is the base Spark configuration directory that is defined by the
SPARK_CONF_DIR
environment variable. The default Spark configuration directory isresources/spark/conf
. truststore_password
-
The password to access the truststore. Default:
cassandra
protocol
-
Defines the encryption protocol. The TLS protocol must be supported by JVM and Spark.
Default: commented out (
TLS
) algorithm
-
Defines the key manager algorithm. Default:
SunX509
store_type
-
Defines the keystore type. Default:
JKS
cipher_suites
-
Defines the cipher suites for Spark encryption. Default:
-
TLS_RSA_WITH_AES_128_CBC_SHA
-
TLS_RSA_WITH_AES_256_CBC_SHA
-
TLS_DHE_RSA_WITH_AES_128_CBC_SHA
-
TLS_DHE_RSA_WITH_AES_256_CBC_SHA
-
TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA
-
TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA
-
Starting Spark drivers and executors
Options to configure how Spark driver and executor processes are created and managed.
Default values:
spark_process_runner:
runner_type: default
run_as_runner_options:
user_slots:
- slot1
- slot2
runner_type
-
-
default
- Use the default runner type. -
run_as
- Use therun_as_runner_options
options. See Running Spark processes as separate users.
-
run_as_runner_options
-
Define the slot users for separating Spark processes users from the DSE service user. See Running Spark processes as separate users.
DSE File System (DSEFS) options
Properties to enable and configure the DSE file system (DSEFS).
DSEFS replaces the Cassandra File System (CFS). |
Default values:
dsefs_options: enabled: false keyspace_name: dsefs work_dir: /var/lib/dsefs public_port: 5598 private_port: 5599 data_directories: - dir: /var/lib/dsefs/data storage_weight: 1.0 min_free_space: 5368709120
# Advanced properties for DSEFS # service_startup_timeout_ms: 30000 # service_close_timeout_ms: 600000 # server_close_timeout_ms: 600000 # compression_frame_max_size: 1048576 # query_cache_size: 2048 # query_cache_expire_after_ms: 2000 # gossip_options: # round_delay_ms: 5000 # startup_delay_ms: 5000 # shutdown_delay_ms: 30000 # rest_options: # request_timeout_ms: 330000 # connection_open_timeout_ms: 55000 # client_close_timeout_ms: 60000 # server_request_timeout_ms: 300000 # idle_connection_timeout_ms: 0 # internode_idle_connection_timeout_ms: 120000 # core_max_concurrent_connections_per_host: 8 # transaction_options: # transaction_timeout_ms: 3000 # conflict_retry_delay_ms: 200 # conflict_retry_count: 40 # execution_retry_delay_ms: 1000 # execution_retry_count: 3
dsefs_options
-
DSE File System (DSEFS) options determine whether DSEFS should be enabled on this node.
enabled
-
Enable or disable DSEFS. This parameter takes one of the following values:
-
true
- enables DSEFS on this node, regardless of the workload. -
false
- disables DSEFS on this node, regardless of the workload. Default. -
blank or commented out (#)
- DSEFS will start only if the node is configured to run analytics workloads.
-
keyspace_name
-
The keyspace where the DSEFS metadata is stored. You can optionally configure multiple DSEFS file systems within a single datacenter by specifying different keyspace names for each cluster. Default:
dsefs
work_dir
-
The local directory for storing the local node metadata, including the node identifier. The volume of data stored in this directory is nominal and does not require configuration for throughput, latency, or capacity. This directory must not be shared by DSEFS nodes.
public_port
-
The public port on which DSEFS listens for clients. DataStax recommends that all nodes in the cluster have the same value. Firewalls must open this port to trusted clients. The service on this port is bound to the RPC address. Default:
5598
private_port
-
The private port for DSEFS inter-node communication. Do not open this port to firewalls; this private port must be not visible from outside of the cluster. Default:
5599
data_directories
-
One or more data locations where the DSEFS data is stored.
- dir
-
Mandatory attribute to identify the set of directories. DataStax recommends segregating these data directories on physical devices different than the devices that are used for DataStax Enterprise. Using multiple directories on JBOD improves performance and capacity. Default:
/var/lib/dsefs/data
storage_weight
-
The weighting factor for this location specifies how much data to place in this directory, relative to other directories in the cluster. This soft constraint determines how DSEFS distributes the data. For example, a directory with a value of 3.0 receives about three times more data than a directory with a value of 1.0. Default:
1.0
min_free_space
-
The reserved space, in bytes, to not use for storing file data blocks. You can use a unit of measure suffix to specify other size units. For example: terabyte (1 TB), gigabyte (10 GB), and megabyte (5000 MB). Default:
5368709120
Advanced properties for DSEFS
service_startup_timeout_ms
-
Wait time, in milliseconds, before the DSEFS server times out while waiting for services to bootstrap. Default:
30000
service_close_timeout_ms
-
Wait time, in milliseconds, before the DSEFS server times out while waiting for services to close. Default:
600000
server_close_timeout_ms
-
Wait time, in milliseconds, that the DSEFS server waits during shutdown before closing all pending connections.
compression_frame_max_size
-
The maximum accepted size of a compression frame defined during file upload. Default:
1048576
query_cache_size
-
Maximum number of elements in a single DSEFS Server query cache. Default:
2048
query_cache_expire_after_ms
-
The time to retain the DSEFS Server query cache element in cache. The cache element expires when this time is exceeded. Default:
2000
gossip options
-
Options to configure DSEFS gossip rounds.
round_delay_ms
-
The delay, in milliseconds, between gossip rounds. Default:
5000
startup_delay_ms
-
The delay time, in milliseconds, between registering the location and reading back all other locations from the database. Default:
5000
shutdown_delay_ms
-
The delay time, in milliseconds, between announcing shutdown and shutting down the node. Default:
30000
rest_options
-
Options to configure DSEFS rest times.
request_timeout_ms
-
The time, in milliseconds, that the client waits for a response that corresponds to a given request. Default:
330000
connection_open_timeout_ms
-
The time, in milliseconds, that the client waits to establish a new connection. Default:
55000
client_close_timeout_ms
-
The time, in milliseconds, that the client waits for pending transfer to complete before closing a connection. Default:
60000
server_request_timeout_ms
-
The time, in milliseconds, to wait for the server rest call to complete. Default:
300000
idle_connection_timeout_ms
-
The time, in milliseconds, to wait before closing an idle connection. Closing idle connections is disabled by default. Default: 0 (disabled)
internode_idle_connection_timeout_ms
-
Wait time, in milliseconds, before closing idle internode connection. The internode connections are primarily used to exchange data during replication. Do not set lower than the default value for heavily utilized DSEFS clusters.
Default: commented out (
0
) (disabled)
core_max_concurrent_connections_per_host
-
Maximum number of connections to a given host per single CPU core. DSEFS keeps a connection pool for each CPU core.
Default:
8
conflict_retry_delay_ms
-
Wait time, in milliseconds, before retrying a transaction that was ended due to a conflict. Default:
200
conflict_retry_count
-
The number of times to retry a transaction before giving up. Default: 40
execution_retry_delay_ms
-
Wait time, in milliseconds, before retrying a failed transaction payload execution. Default: 1000
execution_retry_count
-
The number of payload execution retries before signaling the error to the application. Default: 3
DSE Metrics Collector options
When |
Uncomment these options only to change the default directories:
# insights_options: # data_dir: /var/lib/cassandra/insights_data # log_dir: /var/log/cassandra/
insights_options
-
Options for DSE Metrics Collector.
data_dir
-
Directory to store collected metrics. When not set, the default directory is
/var/lib/cassandra/insights_data
.When
data_dir
is not set, the default location of the/insights_data
directory is the same location as the/commitlog
directory, as defined with thecommitlog_directory
property incassandra.yaml
. log_dir
-
Directory to store logs for collected metrics. The log file is
dse-collectd.log
. The file with the collectd PID isdse-collectd.pid
. When not set, the default directory is/var/log/cassandra/
.
Audit logging options
Default values:
audit_logging_options:
enabled: false
logger: SLF4JAuditWriter
retention_time: 0
audit_logging_options
-
To get the maximum information from data auditing, turn on data auditing on every node. See Enabling data auditing in DataStax Enterprise and Configuring audit logging.
enabled
-
Default:
false
logger
-
Default:
SLF4JAuditWriterfalse
-
SLF4JAuditWriter
- Logs audit information to theSLF4JAuditWriter
logger. Audit logging configuration settings are in the logback.xml file. -
CassandraAuditWriter
- Logs audit information to thedse_audit.audit_log
database table. This logger can be run synchronously or asynchronously. See relatedcassandra_audit_writer_options
configuration entries and Configuring audit logging to a database table.
-
Where is the logback.xml
file?
The location of the logback.xml
file depends on the type of installation:
Installation Type | Location |
---|---|
Package installations + Installer-Services installations |
|
Tarball installations + Installer-No Services installations |
|
included_categories
orexcluded_categories
-
The default is to include all categories. Specify either included or excluded categories.
Comma separated list of audit event categories to include or exclude from the audit log. Categories are: QUERY, DML, DDL, DCL, AUTH, ERROR.
-
included_categories:
comma_separated_list
or
-
excluded_categories
:comma_separated_list
-
included_keyspaces
orexcluded_keyspaces
-
The default is to include all keyspaces. Specify either included or excluded keyspaces. Specifying both is an error.
Use a regular expression to filter keyspaces, or use a comma separated list of keyspaces to be included or excluded from the audit log.
-
included_categories
:comma_separated_list
or
-
excluded_categories
:comma_separated_list
-
retention_time
-
The amount of time, in hours, that audit events are retained by supporting loggers. Only
CassandraAuditWriter
supports retention time. Values of 0 or less retain events forever. Default:0
cassandra_audit_writer_options
-
Configuration options for
CassandraAuditWriter
.cassandra_audit_writer_options: mode: sync batch_size: 50 flush_time: 500 num_writers: 10 queue_size: 10000 write_consistency: QUORUM # dropped_event_log: /var/log/cassandra/dropped_audit_events.log # day_partition_millis: 3600000
mode
-
Sets the mode the writer runs in. Default:
sync
-
sync
- A query is not executed until the audit event is successfully written. -
async
- Audit events are queued for writing to the audit table, but are not necessarily logged before the query executes. A pool of writer threads consumes the audit events from the queue, and writes them to the audit table in batch queries. While this substantially improves performance under load, if there is a failure between when a query is executed, and its audit event is written to the table, the audit table might be missing entries for queries that were executed.
-
batch_size
-
Available only when mode:
async
.Must be greater than 0. The maximum number of events the writer dequeues before writing them out to the table. If warnings in the logs reveal that batches are too large, decrease this value or increase the value of
batch_size_warn_threshold_in_kb
incassandra.yaml
. Default:50
flush_time
-
Available only when mode: async.
The maximum amount of time in milliseconds before an event is removed from the queue by a writer before being written out. This flush time prevents events from waiting too long before being written to the table when there are not a lot of queries happening. Default:
500
num_writers
-
Available only when mode: async.
The number of worker threads asynchronously logging events to the
CassandraAuditWriter
. Default: 10 queue_size
-
The size of the queue feeding the asynchronous audit log writer threads. When there are more events being produced than the writers can write out, the queue fills up, and newer queries are blocked until there is space on the queue. If a value of 0 is used, the queue size is unbounded, which can lead to resource exhaustion under heavy query load. Default: 10000
write_consistency
-
The consistency level that is used to write audit events. Default:
QUORUM
dropped_event_log
-
The directory to store the log file that reports dropped events. Default:
/var/log/cassandra/dropped_audit_events.log
day_partition_millis
-
To spread audit log information across multiple nodes, specify the interval, in milliseconds, between changing nodes. For example, specify 43200000 milliseconds to change the target node every 12 hours. Default: 3600000 (1 hour)
DSE Tiered Storage options
Options to define one or more disk configurations for DSE Tiered Storage.
Specify multiple disk configurations as unnamed tiers by a collection of paths that are defined in priority order, with the fastest storage media in the top tier.
With heterogeneous storage configurations across the cluster, specify each disk configuration with config_name:config_settings
, and in CREATE
or ALTER
table statements.
DSE Tiered Storage does not change compaction strategies. To manage compression and compaction options, use the compaction option. See Modifying compression and compaction. |
Default values:
tiered_storage_options:
strategy1:
tiers:
- paths:
- /mnt1
- /mnt2
- paths:
- /mnt3
- /mnt4
- paths:
- /mnt5
- /mnt6
To manage compaction options, use the compaction option in |
tiered_storage_options
-
Options to configure the smart movement of data across different types of storage media so that data is matched to the most suitable drive type, according to the performance and cost characteristics it requires
strategy1
-
The first disk configuration strategy. Create a strategy2, strategy3, and so on. In this example, strategy1 is the configurable name of the tiered storage configuration strategy.
tiers
-
The unnamed tiers in this section define a storage tier with the paths and file paths that define the priority order.
local_options
-
Local configuration options overwrite the tiered storage settings for the table schema in the local dse.yaml file. See Testing DSE Tiered Storage configurations.
- paths
-
The section of file paths that define the data directories for this tier of the disk configuration. Typically list the fastest storage media first. These paths are used only to store data that is configured to use tiered storage. These paths are independent of any settings in the cassandra.yaml file.
- /filepath
-
Specific file paths to define the data directories for this tier of the disk configuration.
DSE Advanced Replication configuration settings
DSE Advanced Replication configuration options to replicate data from remote clusters to central data hubs.
Default values:
#advanced_replication_options:
enabled: false
conf_driver_password_encryption_enabled: false
advanced_replication_directory: /var/lib/cassandra/advrep
security_base_path: /base/path/to/advrep/security/files/
advanced_replication_options
-
Options to enable DSE Advanced Replication.
enabled
-
Set
enabled:true
on an edge node to collect data in the replication log. Default: false. conf_driver_password_encryption_enabled
-
Enable or disable encryption of driver passwords. When enabled, the stored driver password is expected to be encrypted with the system key. After you create the system key, you must copy the same system key to every node in the cluster.
advanced_replication_directory
-
Set the directory for storing advanced replication CDC logs. Default is
/var/lib/cassandra/advrep
. A directoryreplication_logs
will be created within the specified directory. security_base_path
-
The base path to prepend to paths in the Advanced Replication configuration locations, including locations to SSL keystore, SSL truststore, and so on. Default:
/base/path/to/advrep/security/files/
Inter-node messaging options
Configuration for the internal messaging service used by several components of DataStax Enterprise. For 5.0 and later, all internode messaging requests use this service.
internode_messaging_options:
port: 8609
# frame_length_in_mb: 256
# server_acceptor_threads: 8
# server_worker_threads: 16
# client_max_connections: 100
# client_worker_threads: 16
# handshake_timeout_seconds: 10
# client_request_timeout_seconds: 60
internode_messaging_options
-
Configuration options for inter-node messaging.
port
-
The mandatory port for the inter-node messaging service. Default: 8609
frame_length_in_mb
-
Maximum message frame length. Default: 256
server_acceptor_threads
-
The number of server acceptor threads. Default: the number of available processors.
server_worker_threads
-
The number of server worker threads. Default: the number of available processors * 8.
client_max_connections
-
The maximum number of client connections. Default: 100
client_worker_threads
-
The number of client worker threads. Default: the number of available processors * 8.
handshake_timeout_seconds
-
Timeout for communication handshake process. Default: 10
client_request_timeout_seconds
-
Timeout for non-query search requests like core creation and distributed deletes. Default: 60
DSE Multi-Instance server_id
server_id
-
In DSE Multi-Instance
/etc/dse-nodeId/dse.yaml
files, theserver_id
option is generated to uniquely identify the physical server on which multiple instances are running. Theserver_id
default value is the media access control address (MAC address) of the physical server. You can changeserver_id
when the MAC address is not unique, such as a virtualized server where the host’s physical MAC is cloned.
DSE Graph system-level options
These graph options are system-level configuration options and options that are shared between graph instances. Add an option if it is not present in the provided dse.yaml file.
Default values:
graph:
adjacency_cache_clean_rate: 1024
adjacency_cache_max_entry_size_in_mb: 0
adjacency_cache_size_in_mb: 128
analytic_evaluation_timeout_in_minutes: 10080
gremlin_server_enabled: true
index_cache_clean_rate: 1024
index_cache_max_entry_size_in_mb: 0
index_cache_size_in_mb: 128
max_query_queue: 10000
#max_query_threads:
realtime_evaluation_timeout_in_seconds: 30
schema_agreement_timeout_in_ms: 10000
schema_mode: Production
system_evaluation_timeout_in_seconds: 180
window_size: 100000
max_query_params: 256
graph
-
These graph options are system-level configuration options and options that are shared between graph instances.
adjacency_cache_clean_rate
-
The number of stale rows per second to clean from each graph’s adjacency cache. Default: 1024.
adjacency_cache_max_entry_size_in_mb
-
The maximum entry size in each graph’s adjacency cache. When set to zero, the default is calculated based on the cache size and the number of CPUs. Entries that exceed this size are quietly dropped by the cache without producing an explicit error or log message. Default: 0.
adjacency_cache_size_in_mb
-
The amount of RAM to allocate to each graph’s adjacency (edge and property) cache. Default: 128.
analytic_evaluation_timeout_in_minutes
-
Maximum time to wait for an analytic (Spark) traversal to evaluate. Default: 10080 (7 days).
Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.
gremlin_server_enabled
-
Enables or disables Gremlin Server. Default: true.
index_cache_clean_rate
-
The number of stale entries per second to clean from the adjacency cache. Default: 1024.
index_cache_max_entry_size_in_mb
-
The maximum entry size in the index adjacency cache. When set to zero, the default is based on the cache size and the number of CPUs. Value: integer. + # default is calculated based on the cache size and the number of CPUs. Entries that exceed this size are quietly dropped by the cache without producing an explicit error or log message. Default: 0.
index_cache_size_in_mb
-
The amount of ram to allocate to the index cache. Default: 128.
max_query_queue
-
The maximum number of CQL queries that can be queued as a result of Gremlin requests. Incoming queries are rejected if the queue size exceeds this setting. Default: 10000.
max_query_threads
-
The maximum number of threads to use for queries to the database. When this option is not set, the default is:
-
If gremlinPool is present and nonzero:
10 * the gremlinPool setting
-
If gremlinPool is not present in this file or set to zero:
The number of available CPU cores
See
gremlinPool
. -
realtime_evaluation_timeout_in_seconds
-
Maximum time to wait for a real-time traversal to evaluate. Default: 30 seconds.
Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.
schema_agreement_timeout_in_ms
-
Maximum time to wait for cassandra to agree on schema versions before timing out. Default: 10000
Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.
schema_mode
-
Controls the way that the schemas are handled. Valid values:
-
Production = Schema must be created before data insertion. Schema cannot be changed after data is inserted. Full graph scans are disallowed unless the option graph.allow_scan is changed to TRUE.
-
Development = No schema is required to write data to a graph. Schema can be changed after data is inserted. Full graph scans are allowed unless the option
graph.allow_scan
is changed to FALSE.
-
system_evaluation_timeout_in_seconds
-
Maximum time to wait for a system-based request to execute. Default: 180 (3 minutes).
Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.
window_size
-
The number of samples to keep when aggregating log events. Only a small subset of graph’s log events use this system. Modifying this setting is rarely necessary or helpful. Default: 100000.
max_query_params
-
The maximum number of parameters that can be passed on a graph query request for TinkerPop drivers and drivers using Cassandra native protocol. Passing very large numbers of parameters on requests is an anti-pattern, because the script evaluation time increases proportionally. DataStax recommends reducing the number of parameters to speed up script compilation times. Before you increase this value, consider alternate methods for parameterizing scripts, like passing a single map. If the graph query request requires many arguments, pass a list. Default: 256
DSE Graph id assignment and partitioning strategy options
Default values:
ids:
block_renew: 0.8
community_reuse: 28
consistency_mode: GLOBAL
# datacenter_id: integer unique per DC when consistency_mode: DC_LOCAL
id_hash_modulus: 20
member_block_size: 512
ids
-
DSE Graph configuration options for standard vertex ID assignment and partitioning strategies.
block_renew
-
The graph standard vertex ID allocator operates on blocks of contiguous IDs. Each block is allocated using a database lightweight transaction that requires coordination latency. To hide the cost of allocating a standard ID block, the allocator begins asynchronously buffering a replacement block whenever a current block is nearly empty. This
block_renew
parameter defines "nearly empty" as a floating point number between 0 and 1. The value is how much of a standard ID block can be used before graph starts asynchronously allocating its replacement. This setting has no effect on custom IDs. Value must be between 0 and 1. Default: 0.8. community_reuse
-
For graphs using standard vertex IDs, if a transaction creates multiple vertices, the allocator attempts to assign vertex IDs that colocate vertices on the same database replicas. If an especially large vertex cohort is created, the allocator chunks the vertex creation and assigns a random target location to avoid load hotspotting. This setting controls the vertex chunk size and has no effect on custom IDs. Default: 28.
consistency_mode
-
Must be set to
DC_LOCAL
orGLOBAL
.-
DC_LOCAL
- The node usesLOCAL_QUORUM
when allocating an ID for a graph vertex. Thedatacenter_id
option must be correctly configured on every node in the cluster. -
GLOBAL
- (Default) The node usesQUORUM
when allocating an ID for a graph vertex. Thedatacenter_id
option is ignored. This option must have the same value on every node in the cluster. Its value can only be changed when the entire cluster is stopped. This setting has no effect on custom IDs.
-
datacenter_id
-
Applies only when
consistency_mode
isDC_LOCAL
. Set to an arbitrary value between 1 and 127, inclusive. This setting has no effect on custom IDs.Each datacenter in the cluster must have a unique
datacenter_id
. Violating this constraint corrupts the graph database without warning.This setting has no effect on custom IDs. Default: no explicit default value.
id_hash_modulus
-
An integer between 1 and 2^24 (both inclusive) that affects maximum ID capacity and the maximum storage space used by ID allocations. Lower values reduce the storage space consumed and the lightweight transaction overhead imposed at startup. Lower values also reduce the total number of IDs that can be allocated over the life of a graph, because this parameter is proportional to the allocatable ID space. However, the proportion coefficient is Long.
MAX_VALUE
(2^63-1), so ID headroom should be sufficient, practically speaking, even if this is set to 1. This setting has no effect on custom IDs. Default: 20. member_block_size
-
The graph standard vertex ID allocator claims uniformly-sized blocks of contiguous IDs using lightweight transactions on the database. This setting controls the size of each block. This setting has no effect on custom IDs. Default: 512.
DSE Graph listener options
Default values:
listener:
listener_name: string
black_types: # This list is empty by default
interval_in_seconds: 3600
type: slf4j
white_types: # This list is empty by default
listener
-
Options that contain all registered state listeners identified by their name.
listener_name
-
Replace
listener_name
with a string that identifies the listener. The string must begin with a lower case letter and can be composed of lowercase letters, numbers, and underscores. black_types
-
The names of state types that are ignored. All state types but those given are listened to. Default: (empty).
interval_in_seconds
-
The interval in which the state values are logged. Default: 3600
Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.
type
-
The type of the state listener. Must be one of the following values:
slf4j
. Default:slf4j
. white_types
-
The names of state types that should be listened. Only those state types are listened to and all others ignored. Default: (empty).
DSE Graph messaging options
Default values:
msg:
graph_msg_timeout_in_ms: 5000
msg
-
Options to configure DSE Graph internal query and lightweight messaging system.
graph_msg_timeout_in_ms
-
Graph messages must be acknowledged within this interval, or else the message is assumed dropped/failed. Graph retries the message or fails the responsible request if the retry limit is exceeded. Default: 5000
Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.
DSE Graph event observers options
Default values:
observer:
observer_name: string
black_types: # This list is empty by default
observed_graphs: # This list is empty by default
slow_threshold_in_ms: 300000
type: slf4j
white_types: # This list is empty by default
observer
-
Options to configure all registered event observers identified by their name.
observer_name
-
Replace
observer_name
with a string that identifies the event observers. This string is the names of event types that are ignored. All event types but those given are observed. The string must begin with a lower case letter and can be composed of lowercase letters, numbers, and underscores. Value: YAML-formatted list of strings. black_types
-
The names of event types that are ignored. All event types but those given are observed. Value: YAML-formatted list of strings. Default: (empty).
observed_graphs
-
The names of the graphs for which events are observed. Value: YAML-formatted list of strings. Default: (empty).
slow_tx_graphs
-
The names of the graphs for which slow transactions are monitored. Default: (empty).
slow_threshold_in_ms
-
Threshold at which slow queries get reported. Default: 300000
Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.
type
-
The type of the event observer. Must be one of the following values:
slf4j
,slow_request
. Default:slf4j
. white_types
-
The names of event types that should be observed. Only those event types are observed and all others ignored. Value: YAML-formatted list of strings. Default: (empty).
DSE Graph shared data options
Default values:
shared_data:
refresh_interval_in_ms: 60000
shared_data
-
Options for shared data in DSE Graph.
refresh_interval_in_ms
-
The interval between refreshes in which the graph schema is reread from the database tables. Note that schema is also immediately updated when schema changes occur, so this parameter is a fail safe to poll for schema changes periodically. Default: 60000
Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.
DSE Graph Gremlin Server options
The Gremlin Server is configured using Apache TinkerPop specifications.
Default values:
gremlin_server:
# port: 8182
# threadPoolWorker: 2
# gremlinPool: 0
# scriptEngines:
# gremlin-groovy:
# config:
# sandbox_enabled: false
# sandbox_rules:
# whitelist_packages:
# - package.name
# whitelist_types:
# - fully.qualified.type.name
# whitelist_supers:
# - fully.qualified.class.name
# blacklist_packages:
# - package.name
# blacklist_supers:
# - fully.qualified.class.name
gremlin_server
-
The top-level configurations in Gremlin Server.
port
-
The
port
value identifies the available communications port for Gremlin Server. Default: 8182 threadPoolWorker
-
The number of worker threads that handle requests and responses on the Gremlin Server channel, including routing requests to the right server operations, handling scheduled jobs on the server, and writing serialized responses back to the client. Default: 2
gremlinPool
-
The number of Gremlin threads available to execute actual scripts in a ScriptEngine. This pool represents the workers available to handle blocking operations in Gremlin Server. Default: 8
scriptEngines
-
Section to configure gremlin server scripts.
gremlin-groovy
-
Section for gremlin-groovy scripts.
sandbox_enabled
-
Sandbox is enabled by default. To disable the gremlin groovy sandbox entirely, set to false.
sandbox_rules
-
Section for sandbox rules.
whitelist_packages
-
List of packages, one package per line, to whitelist.
-package.name
-
Retain the hyphen before the fully qualified package name.
whitelist_types
-
List of types, one type per line, to whitelist.
-fully.qualified.type.name
-
Retain the hyphen before the fully qualified type name.
whitelist_supers
-
List of super classes, one class per line, to whitelist. Retain the hyphen before the fully qualified class name.
-fully.qualified.class.name
-
Retain the hyphen before the fully qualified class name.
blacklist_packages
-
List of packages, one package per line, to blacklist.
-package.name
-
Retain the hyphen before the fully qualified package name.
blacklist_supers
-
List of super classes, one class per line, to blacklist. Retain the hyphen before the fully qualified class name.
-fully.qualified.class.name
-
Retain the hyphen before the fully qualified class name.