dse.yaml configuration file

Primary DataStax Enterprise configuration file.

The dse.yaml file is the primary configuration file for DataStax Enterprise.

For Cassandra database configuration, see the cassandra.yaml file.
The DataStax Enterprise configuration properties are grouped into the following sections:

Syntax

For the properties in each section, the main setting has zero spaces, and at least two spaces are required before each entry in the section. For example, in the node_health_options section, at least two spaces are required before refresh_rate_ms and uptime_ramp_up_period_seconds:
node_health_options:
  refresh_rate_ms: 50000
  uptime_ramp_up_period_seconds: 10800
  dropped_mutation_window_minutes: 30

Authentication options 

Authentication options enable multiple authentication schemes to be used on the same DataStax Enterprise cluster. Additional configuration is required in the cassandra.yaml file. You must also grant authorization to the configured schemes, see Configuring authorization and object permissions.
authentication_options:
  enabled: false
  default_scheme: kerberos
  other_schemes:
    - internal
  scheme_permissions: true
  allow_digest_with_kerberos: true
  plain_text_without_ssl: warn
  transitional_mode: normal
authentication_options 
To use these options, set the authenticator in cassandra.yaml to com.datastax.bdp.cassandra.auth.DseAuthenticator.
enabled
Controls whether the DSE Unified Authenticator authenticates users. The DSE Unified Authenticator allows multiple authentication schemes to be used at the same time. The driver selects which scheme to use during authentication. Set enabled: false to use the direct equivalent of AllowAllAuthenticator in cassandra.yaml.
default_scheme 
Selects which authentication scheme is used if the driver does not request a specific scheme.
  • internal - Plain text authentication using the internal Cassandra password authentication.
  • ldap - Plain text authentication using pass-through LDAP authentication.
  • kerberos - GSSAPI authentication using the Kerberos authenticator.
other_schemes 
A list of schemes that can be automatically selected for use by a driver. This option can use the same list of schemes as the default_scheme.
scheme_permissions 
Controls whether roles require permissions for specific authentication schemes. These permissions can be granted only when the DSE Authorizer is used.
allow_digest_with_kerberos 
Controls whether DIGEST-MD5 authentication is also allowed with Kerberos. The DIGEST-MD5 mechanism is not directly associated with an authentication scheme, but is used by Kerberos to pass credentials between nodes and jobs. In analytics clusters, set to true when using Hadoop inter-node authentication with Hadoop and Spark jobs.
plain_text_without_ssl 
Controls how the DseAuthenticator responds to plain text authentication requests over unencrypted client connections. Set to one of the following values:
  • block - Block the request with an authentication error.
  • warn - Log a warning about the request but allow it to continue.
  • allow - Allow the request without any warning.
transitional_mode 
Allows the DseAuthenticator to operate in a temporary transitional mode during setup of authentication in a cluster. Set to one of the following values:
  • disabled - Transitional mode is disabled.
  • permissive - Only a superuser is authenticated and logged in. All other authentication attempts are logged in as the anonymous user.
  • normal - If credentials are passed, they are authenticated.
    • If the authentication is successful, the user is logged in.
    • If the authentication fails, the user is logged in as anonymous.
    • If no credentials are passed, the user is logged in as anonymous.
  • strict - If credentials are passed, they are authenticated.
    • If the authentication is successful, the user is logged in.
    • If the authentication fails, then an authentication error is returned.
    • If no credentials are passed, the user is logged in as anonymous.

Role management options 

role_management_options:
  mode: internal
role_management_options 
To use this option, set the role_manager in cassandra.yaml to com.datastax.bdp.cassandra.auth.DseRoleManager. To configure define whether roles are managed internally by DataStax Enterprise or by an external LDAP server, configure DSE Role management.
mode 
Set to one of the following values:
  • internal - (Default) Granting and revoking of roles is managed internally with Cassandra database roles that are set with CQL using GRANT ROLE and REVOKE ROLE statements.
  • ldap - Granting and revoking of roles is managed by an external LDAP server configured using the ldap_options. To configure and use LDAP authentication, complete the steps in Configuring LDAP.

Authorization options 

authorization_options: 
  enabled: false
  transitional_mode: disabled
authorization_options 
Controls whether DataStax Enterprise authorization is used. The DSE Authorizer (DseAuthorizer) extends the CassandraAuthorizer.
enabled 
Enables the use of DSE Authorizer for authorization.
transitional_mode 
Allows the DSE Authorizer to operate in a temporary transitional mode during setup of authorization in a cluster. Set to one of the following values:
  • disabled - Transitional mode is disabled.
  • normal - Permissions can be passed to resources, but are not enforced.
  • strict - Permissions can be passed to resources, and are enforced on authenticated users. Permissions are not enforced against anonymous users.

Kerberos options 

Use the kerberos_options to configure security for a DataStax Enterprise cluster using Kerberos. For instructions, see Authenticating a cluster with Kerberos.

kerberos_options:
  keytab: path_to_keytab/dse.keytab
  service_principal: dse_user/_HOST@REALM
  http_principal: HTTP/_HOST@REALM
  qop: auth
keytab 
The keytab file must contain the credentials for both of the fully resolved principal names, which replace _HOST with the fully qualified domain name (FQDN) of the host in the service_principal and http_principal settings. The UNIX user running DSE must also have read permissions on the keytab.
service_principal 
The service_principal that the Cassandra and Hadoop processes run under must use the form dse_user/_HOST@REALM.
where dse_user is:
  • Installer-Services and Package installations: cassandra
  • Package installations: the name of the UNIX user that starts the service
where:
  • _HOST is converted to a reverse DNS lookup of the broadcast address.
  • REALM is the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase.
The service_principal must be consistent everywhere: in the dse.yaml file, present in the keytab, and in the cqlshrc file (where service_principal is separated into service/hostname).
http_principal 
The http_principal is used by the Tomcat application container to run DSE Search. The Tomcat web server uses GSS-API mechanism (SPNEGO) to negotiate the GSSAPI security mechanism (Kerberos). Set REALM to the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase.
qop 
A comma-delimited list of Quality of Protection (QOP) values that clients and servers can use for each connection. The client can have multiple QOP values, while the server can have only a single QOP value. The valid values are:
  • auth - Default: Authentication only.
  • auth-int - Authentication plus integrity protection for all transmitted data.
  • auth-conf - Authentication plus integrity protection and encryption of all transmitted data.

    Encryption using auth-conf is separate and independent of whether encryption is done using SSL. If both auth-conf and SSL are enabled, the transmitted data is encrypted twice. DataStax recommends choosing only one method and using it for both encryption and authentication.

LDAP options 

To use the ldap_options, you must set com.datastax.bdp.cassandra.auth.DseAuthenticator as the authenticator in the cassandra.yaml file, and ldap for the default_scheme or other_schemes in the authentication_options section in dse.yaml. See About authentication with LDAP.
ldap_options:
    server_host: localhost
    server_port: 389
    search_dn: cn=Admin
    search_password: secret
    use_ssl: false
    use_tls: false
    truststore_path:
    truststore_password:
    truststore_type: jks
    user_search_base: ou=users,dc=example,dc=com
    user_search_filter: (uid={0})
    user_memberof_attribute: memberof
    group_search_type: directory_search#
    group_search_base:#
    group_search_filter: (uniquemember={0})
    group_name_attribute: cn
    credentials_validity_in_ms: 0
    connection_pool:
        max_active: 8
        max_idle: 8
server_host 
The host name of the LDAP server. Default: localhost
server_port 
The port on which the LDAP server listens. Default: 389
search_dn 
The username of the user that is used to search for other users on the LDAP server. If not present, an anonymous bind is used for the search.
search_password 
The password of the search_dn user.
use_ssl 
Set to true to enable SSL connections to the LDAP server. If set to true, you might need to change server_port to the SSL port of the LDAP server. Default: false
use_tls 
Set to true to enable TLS connections to the LDAP server. If set to true, change the server_port to the TLS port of the LDAP server. Default: false
truststore_path 
The path to the truststore for SSL certificates.
truststore_password 
The password to access the trust store.
truststore_type 
The type of truststore. Default: jks
user_search_base 
The search base for your domain, used to look up users. Set the ou and dc elements for your LDAP domain. Typically this is set to ou=users,dc=domain,dc=top_level_domain. For example, ou=users,dc=example,dc=com.
user_search_filter 
The search filter for looking up user names. Default: uid={0}
user_memberof_attribute 
For use with DataStax Enterprise unified authentication role management. The attribute on the user entry that contains group membership information.
group_search_type 
For use with DataStax Enterprise unified authentication role management. Define how group membership is determined for a user. Choose from one of the following values:
  • directory_search - Filters the results by doing a subtree search of group_search_base to find groups that match the group_search_filter. (Default)
  • memberof_search - Get groups from the memberof attribute of the user. The directory server must have memberof support.
group_search_base 
The unique distinguished name (DN) of the group from which to base the group membership search on.
group_search_filter 
The LDAP group to filter the search on. Default: (uniquemember={0})
group_name_attribute 
The attribute in the group entry that holds the LDAP group name. Default: cn
credentials_validity_in_ms 
The duration period in milliseconds for the credential cache. Default: 0
search_validity_in_seconds 
The duration period in milliseconds for the search cache. Default: 0
connection_pool 
The configuration settings for the connection pool for making LDAP requests.
  • max_active - The maximum number of active connections to the LDAP server. Default: 8
  • max_idle - The maximum number of idle connections in the pool awaiting requests. Default: 8

System encryption settings 

system_info_encryption:
  enabled: false
  cipher_algorithm: AES
  secret_key_strength: 128
  chunk_length_kb: 64
system_info_encryption 
DataStax recommends using remote encryption keys from a KMIP key server when using Transparent Data Encryption (TDE) features. Local key support is provided when a KMIP key server is not available.
enabled
Enable to locally encrypt system tables that might contain sensitive information, including system.batchlog, system.paxos, hint files, and the Cassandra commit log. If true, system tables that contain sensitive information are encrypted. When you enable system table encryption on a node with existing data, run nodetool upgradesstables -a on the listed tables. Default: false

System traces, which might contain sensitive information, are not affected by this setting. To encrypt traces, configure encryption on tables in the system_traces keyspace. See configuring encryption per table using TDE.

cipher_algorithm 
Default: AES
secret_key_strength 
Default: 128
chunk_length_kb 
Default: 64
key_name
Can be added to specify the name of the keys file that is created to encrypt system tables in the system_key_directory/system/key_name. Comment out when using key_provider: KmipKeyProviderFactory Default: system_table_keytab
key_provider 
An alternate key provider only for local encryption when using a KMIP host as a key provider. Omit this field if you are not using KmipKeyProviderFactory. Default: KmipKeyProviderFactory
kmip_host 
When key_provider: KmipKeyProviderFactory, the kmip_groupname that is defined for the kmip_hosts entry in dse.yaml that describes the KMIP key server or group of KMIP key servers.

Encryption and system key settings 

Settings for encrypting passwords and sensitive system tables.
system_key_directory: /etc/dse/conf
config_encryption_active: false
config_encryption_key_name: system_key
system_key_directory 
The directory where global encryption keys, called system keys, are created and stored. Keys that are used for SSTable encryption must be distributed to all nodes. DataStax Enterprise must be able to read and write to this directory, have 700 permissions, and belong to the dse user. Default: /etc/dse/conf
See Encrypting using off-server encryption keys and Encrypting using local encryption keys.
config_encryption_active 
When set to true (default: false), the following configuration values must be encrypted:
dse.yaml
ldap_options.search_password
ldap_options.truststore_password

cassandra.yaml

server_encryption_options.keystore_password
server_encryption_options.truststore_password
client_encryption_options.keystore_password 
client_encryption_options.truststore_password 
ldap_options.truststore_password
config_encryption_key_name 
The name of the system key for encrypting and decrypting stored passwords in the configuration files. To encrypt keyfiles, use dsetool createsystemkey. When config_encryption_active is true, you must provide a valid key with this name for the system_key_directory option. Default: system_key

KMIP encryption options 

Options for KMIP encryption keys and communication between the DataStax Enterprise node and the KMIP key server or key servers. Enables DataStax Enterprise encryption features to use encryption keys that stored on a server that is not running DataStax Enterprise.
kmip_hosts:  
  kmip_groupname:
    hosts: kmip1.yourdomain.com, kmip2.yourdomain.com 
    keystore_path: pathto/kmip/keystore.jks
    keystore_type: jks
    keystore_password: password
    truststore_path: pathto/kmip/truststore.jks
    truststore_type: jks
    truststore_password: password
kmip_hosts 
Connection settings for key servers that support the KMIP protocol.
kmip_groupname 
A user-defined name for a group of options to configure a KMIP server or servers, key settings, and certificates. Configure options for a kmip_groupname section for each KMIP key server or group of KMIP key servers. Using separate key server configuration settings allows use of different key servers to encrypt table data, and eliminates the need to enter key server configuration information in DDL statements and other configurations. Multiple KMIP hosts are supported.
hosts
A comma-separated list of hosts[:port] for the KMIP key server. There is no load balancing. In failover scenarios, failover occurs in the same order that servers are listed. For example: hosts: kmip1.yourdomain.com, kmip2.yourdomain.com
keystore_path
The path to a java keystore that identifies the DSE node to the KMIP key server. For example: /path/to/keystore.jks
keystore_type
The type of key store. The default value is jks.
keystore_password
The password to access the key store.
truststore_path
The path to a java truststore that identifies the KMIP key server to the DataStax Enterprise node. For example: /path/to/truststore.jks
truststore_type
The type of truststore.
truststore_password
The password to access the truststore.
key_cache_millis
Milliseconds to locally cache the encryption keys that are read from the KMIP hosts. The longer the encryption keys are cached, the fewer requests are made to the KMIP key server, but the longer it takes for changes, like revocation, to propagate to the DataStax Enterprise node. DataStax Enterprise uses concurrent encryption, so multiple threads fetch the secret key from the KMIP key server at the same time. Default: 300000. DataStax recommends using the default value.
timeout
Socket timeout in milliseconds. Default: 1000.

DSE In-Memory options 

max_memory_to_lock_mb 
To use the DSE In-Memory, choose one of these options to specify how much system memory to use for all in-memory tables.
  • max_memory_to_lock_fraction

    Specify a fraction of the system memory. The default value of 0.20 specifies to use up to 20% of system memory.

  • max_memory_to_lock_mb

    Specify a maximum amount of memory in MB.

Node health options 

node_health_options:
  refresh_rate_ms: 50000
  uptime_ramp_up_period_seconds: 10800
  dropped_mutation_window_minutes: 30
node_health_options 
Node health options are always enabled.
refresh_rate_ms 
Default: 60000
uptime_ramp_up_period_seconds 
The amount of continuous uptime required for the node's uptime score to advance the node health score from 0 to 1 (full health), assuming there are no recent dropped mutations. The health score is a composite score based on dropped mutations and uptime. Tip: If a node is repairing after a period of downtime, you might want to increase the uptime period to the expected repair time. Default: 10800 (3 hours)
dropped_mutation_window_minutes 
The historic time window over which the rate of dropped mutations affect the node health score. Default: 30

Health-based routing 

enable_health_based_routing: true
enable_health_based_routing

Enable replication selection for distributed Solr queries to consider node health when multiple candidates exist for a particular token range. Health-based routing enables a trade-off between index consistency and query throughput. When the primary concern is performance, do not enable health-based routing. Default: true

Lease metrics 

lease_metrics_options:
    enabled:false
    ttl_seconds: 604800
lease_metrics_options 
Lease holder statistics help monitor the lease subsystem for automatic management of Job Tracker and Spark Master nodes.
enabled
Enables (true) or disables (false) log entries related to lease holders. Most of the time you do not want to enable logging. Default: true
ttl_seconds
Defines the time, in milliseconds, to persist the log of lease holder changes. Logging of lease holder changes is always on, and has a very low overhead. Default: 604800

Solr index encryption settings 

solr_encryption_options:
    decryption_cache_offheap_allocation: true
    decryption_cache_size_in_mb: 256
solr_encryption_options 
Specify settings to tune encryption of Solr indexes.
decryption_cache_offheap_allocation 
Specify whether to allocate Solr decryption cache off JVM heap. Default: true
decryption_cache_size_in_mb 
Sets the maximum size of shared Solr decryption cache, in MB. Default: 256

Scheduler settings for Solr indexes 

ttl_index_rebuild_options:
    fixed_rate_period: 300
    initial_delay: 20
    max_docs_per_batch: 4096
    thread_pool_size: 1
ttl_index_rebuild_options 
To ensure that records with TTLs are purged from search indexes when they expire, the search indexes are periodically checked for expired documents. The ttl_index_rebuild_options settings control the schedulers in charge of querying for and removing expired records, and the execution of the checks.
fix_rate_period 
Schedules how often to check for expired data in seconds. Default: 300
initial_delay 
Speeds start-up time by delaying the first TTL checks in seconds. Default: 20
max_docs_per_batch 
Sets the maximum number of documents to check and delete per batch by the TTL rebuild thread. Default: 4096
thread_pool_size 
To manage system resource consumption and prevent many Solr cores from executing simultaneous TTL deletes, define the maximum number of cores that can execute TTL cleanup concurrently. Default: 1

Reindexing of bootstrapped data 

async_bootstrap_reindex: false
async_bootstrap_reindex
For DSE Search, configure whether to asynchronously reindex bootstrapped data. Default: false
  • If enabled, the node joins the ring immediately after bootstrap and reindexing occurs asynchronously. Do not wait for post-bootstrap reindexing so that the node is not marked down.
  • If disabled, the node joins the ring after reindexing the bootstrapped data.

CQL Solr paging  

Options to specify the paging behavior.
cql_solr_query_paging: off
cql_solr_query_paging 
Options to specify the paging behavior. Default: off.
  • off - Paging is off. Ignore driver paging settings for CQL Solr queries and use normal Solr paging unless:
    • The current workload is an analytics workload, including SearchAnalytics. SearchAnalytics nodes always use driver paging settings.
    • The cqlsh query parameter paging is set to driver.

      Even when cql_solr_query_paging: off, paging is dynamically enabled with the "paging":"driver" parameter in JSON queries.

  • driver - Respects driver paging settings. Specifies to use Solr pagination (cursors) only when the driver uses pagination. Enabled automatically for DSE SearchAnalytics workloads.

Solr resource upload limit 

You can configure the maximum resource file size or disable resource upload. .
solr_resource_upload_limit_mb: 10
solr_resource_upload_limit_mb 
Sets the maximum Solr resource upload size limit in MB. Set to 0 to disable resource uploading. Default: 10

Solr shard transport options 

The shard_transport_options use netty for inter-node communication between DSE Search nodes. HTTP is deprecated. Also see Shard transport options for DSE Search communications.

shard_transport_options:
    type: netty
    netty_server_port: 8984
    netty_server_acceptor_threads:
    netty_server_worker_threads:
    netty_client_worker_threads:
    netty_client_max_connections:
    netty_client_request_timeout:
    netty_max_frame_length_in_mb:#
# Options specific to the "http" transport type are deprecated.
#   http_shard_client_conn_timeout: 0
#   http_shard_client_socket_timeout: 0
shard_transport_options 
When type: netty, define the following netty settings to configure inter-node communication between DSE Search nodes:
type 
  • netty TCP-based communication that provides lower latency, improved throughput, and reduced resource consumption.
  • Deprecated. http uses a standard HTTP-based interface.
netty_server_port 
The TCP listen port. For releases earlier than DataStax Enterprise 5.0, this setting was mandatory to use the netty transport and is used only during the upgrade to 5.0. After all nodes are running 5.0, requests that are coordinated by this node will no longer contact other nodes on this port. For 5.0 and later, requests use Inter-node messaging options. Default: 8984
netty_server_acceptor_threads 
The number of server acceptor threads. Default: number_of_available_processors
netty_server_worker_threads 
The number of server worker threads. Default: number_of_available_processors * 8
netty_client_worker_thread 
The number of client worker threads. Default: number_of_available_processors * 8
netty_client_max_connections 
The maximum number of client connections. Default: 100
netty_client_request_timeout 
The client request timeout is the maximum cumulative time (in milliseconds) that a distributed Solr request will wait idly for shard responses. Default: 60000
netty_max_frame_length_in_mb 
The maximum length of a message frame, in MB. Default: 256

Deprecated. When type: http, define the following http transport settings to configure http inter-node communication between DSE Search nodes. To avoid blocking operations, DataStax strongly recommends changing these settings to a finite value. These settings are valid across Solr cores:

http_shard_client_conn_timeout 
HTTP shard client timeouts in milliseconds. 0 = no timeout. Default: 0
http_shard_client_socket_timeout 
HTTP shard client socket timeouts in milliseconds. 0 = no timeout. Default: 0

Solr indexing settings 

DSE Search implements multi-threaded indexing to improve performance on multi-core machines. All index updates are internally dispatched to a per-core indexing thread pool and executed asynchronously, which allows for greater concurrency and parallelism. However, index requests can return a response before the indexing operation is executed.
max_solr_concurrency_per_core: 2
# enable_back_pressure_adaptive_nrt_commit: true
# back_pressure_threshold_per_core: 2000
# flush_max_time_per_core: 5
# load_max_time_per_core: 5
# enable_index_disk_failure_policy: false
# solr_data_dir: /MyDir
# solr_field_cache_enabled: false
max_solr_concurrency_per_core 
Configures the maximum number of concurrent asynchronous indexing threads per Solr core. If set to 1, DSE Search uses synchronous indexing behavior in a single thread. To achieve optimal performance, assign this value to number of available CPU cores divided by the number of Solr cores. For example, with 16 CPU cores and 4 Solr cores, the suggested value is 4. Also see Configuring and tuning indexing performance and Increasing indexing throughput. Default: number_of_available_CPU_cores. To prevent writes from overwhelming reads, reduce this value and adjust parallelDeleteTasks in solrConfig.xml.
Note: Dynamic switching to Solr concurrency level at 1 is disallowed.
enable_back_pressure_adaptive_nrt_commit 
Allows back pressure system to adapt max auto soft commit time (defined per core in the solrconfig.xml file) to the actual load. Setting is respected only for NRT (near real time) cores. When Solr core is using live indexing with RT (real time) enabled, adaptive commits are disabled regardless of this property value. Default: true
back_pressure_threshold_per_core 
The total number of queued asynchronous indexing requests per Solr core. When this number is exceeded, back pressure prevents excessive resource consumption by throttling new incoming requests. DataStax recommends using 1000 * max_solr_concurrency_per_core. Default is 2000.
flush_max_time_per_core 

The maximum time, in minutes, to wait for the flushing of asynchronous index updates, which occurs at Solr commit time or at Cassandra flush time. Expert level knowledge is required to change this value. Always set the value reasonably high to ensure that flushing completes successfully. If the configured value is exceeded, index updates are only partially committed, and the Cassandra commit log is not truncated to ensure data durability.

Note: When a timeout occurs, it usually means this node is being overloaded and cannot flush in a timely manner. Live indexing increases the time to flush asynchronous index updates.
Default: 5
load_max_time_per_core 
The maximum time, in minutes, to wait for each Solr core to load on startup or create/reload operations, expressed. This advanced option should be changed only if exceptions happen during core loading. Default: 1 (if not specified)
enable_index_disk_failure_policy 
DSE Search activates the configured Cassandra disk failure policy if IOExceptions occur during index update operations. Default: false
solr_data_dir 
The directory to store index data. By default, the Solr data is saved in cassandra_data_dir/solr.data, or as specified by the dse.solr.data.dir system property.
solr_field_cache_enabled 
The Lucene field cache is deprecated. Instead, for fields that are sorted, faceted, or grouped by, set docValues="true" on the field in the schema.xml file. Then RELOAD the core and reindex. The default value is false. To override false, set useFieldCache=true in the Solr request.

Solr CQL query options 

Available option for CQL Solr queries.
cql_solr_query_row_timeout: 10000
cql_solr_query_row_timeout 
The maximum time in milliseconds to wait for each row to be read from Cassandra during CQL Solr queries. Default: 10000 milliseconds (10 seconds)

Global Performance Service options 

Available options to configure the thread pool that is used by most plug-ins. A dropped task warning is issued when the performance service requests more tasks than performance_max_threads + performance_queue_capacity. When a task is dropped, collected statistics might not be current. Tuning options include disabling or reconfiguring some services, or increasing the queue size.
performance_core_threads: 4
performance_max_threads: (cassandra.concurrent_writes)
performance_queue_capacity: 32000
performance_core_threads 
Number of background threads used by the performance service. When not set, the default is 4.

Default: commented out (4)

performance_max_threads 
Maximum number of background threads used by the performance service. When not set, the default is the value of concurrent_writes in cassandra.yaml.

Default: commented out (32)

performance_queue_capacity 
The number of queued tasks in the backlog when the number of performance_max_threads are busy, with a minimum value of 0.

Default: commented out (32000)

Performance Service options 

These settings are used by the Performance Service to configure collection of performance metrics on Cassandra nodes. Performance metrics are stored in the dse_perf keyspace and can be queried with CQL using any CQL-based utility, such as cqlsh, DataStax DevCenter, or any application using a Cassandra CQL driver.
graph_events:
    ttl_seconds: 600
cql_slow_log_options:
    enabled: true
    threshold: 2000.0
    minimum_samples: 100
    ttl_seconds: 259200
cql_system_info_options:
    enabled: false
    refresh_rate_ms: 10000
resource_level_latency_tracking_options:
    enabled: false
    refresh_rate_ms: 10000
db_summary_stats_options:
    enabled: false
    refresh_rate_ms: 10000
cluster_summary_stats_options:
    enabled: false
    refresh_rate_ms: 10000
histogram_data_options:
  enabled: false
  refresh_rate_ms: 10000
  retention_count: 3
user_level_latency_tracking_options:
   enabled: false
   refresh_rate_ms: 10000
   top_stats_limit: 100
   quantiles: false
graph_events 
graph_events:
    ttl_seconds: 600
Graph event information.
ttl_seconds
Defines the TTL in milliseconds. Default: 600
cql_slow_log_options 
cql_slow_log_options:
    enabled: true
    threshold: 2000.0
    minimum_samples: 100
    ttl_seconds: 259200
Report distributed sub-queries for Solr (query executions on individual shards) that take longer than a specified period of time. See Collecting slow queries.
enabled
Enables (true) or disables (false) log entries for slow queries. Default: true
threshold 
Defines the threshold (in milliseconds or as a percentile). Default: 2000
  • A value greater than 1 is expressed in time and will log queries that take longer than the specified number of milliseconds.
  • A value of 0 to 1 is expressed as a percentile and will log queries that exceed this percentile.
Defines the threshold (in milliseconds or as a percentile). Default: 2000
  • A value greater than 1 is expressed in time and will log queries that take longer than the specified number of milliseconds.
  • A value of 0 to 1 is expressed as a percentile and will log queries that exceed this percentile.
minimum_samples 
Defines the initial number of queries before activating the percentile filter. Default: 100
ttl_seconds 
Defines the time, in milliseconds, to keep the slow query log entries. Default: 259200
cql_system_info_options 
cql_system_info_options:
    enabled: false
    refresh_rate_ms: 10000
CQL system information tables settings See Collecting system level diagnostics.
enabled  
Default: false
refresh_rate_ms 
Default: 10000
resource_level_latency_tracking_options 
resource_level_latency_tracking_options:
    enabled: false
    refresh_rate_ms: 10000
Data resource latency tracking settings. See Collecting system level diagnostics.
enabled
Default: false
refresh_rate_ms
Default: 10000
db_summary_stats_options 
db_summary_stats_options:
    enabled: false
    refresh_rate_ms: 10000
Database summary statistics settings. See Collecting database summary diagnostics.
enabled
Default: false
refresh_rate_ms
Default: 10000
cluster_summary_stats_options 
cluster_summary_stats_options:
    enabled: false
    refresh_rate_ms: 10000
Cluster summary statistics settings. See Collecting cluster summary diagnostics.
enabled
Default: false
refresh_rate_ms
Default: 10000
histogram_data_options 
histogram_data_options:
  enabled: false
  refresh_rate_ms: 10000
  retention_count: 3
Column Family Histogram data tables settings. See Collecting table histogram diagnostics.
enabled
Default: false
refresh_rate_ms
Default: 10000
retention_count 
Default: 3
user_level_latency_tracking_options 
user_level_latency_tracking_options:
   enabled: false
   refresh_rate_ms: 10000
   top_stats_limit: 100
   quantiles: false
User-resource latency tracking settings. See Collecting user activity diagnostics.
enabled
Default: false
refresh_rate_ms
Default: 10000
top_stats_limit 
Default: 100
quantiles 
Default: false

Solr Performance Service options 

These settings are used by the Performance Service. See Performance Service.
solr_indexing_error_log_options:
    enabled: false
    ttl_seconds: 604800
    async_writers: 1
solr_slow_sub_query_log_options:
    enabled: false
    ttl_seconds: 604800
    threshold_ms: 3000
    async_writers: 1
solr_update_handler_metrics_options:
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
solr_request_handler_metrics_options:
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
solr_index_stats_options:
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
solr_cache_stats_options:
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
solr_latency_snapshot_options:
    enabled: false
    ttl_seconds: 604800
    refresh_rate_ms: 60000
solr_indexing_error_log_options 
Enable to collect record errors that occur during Solr document indexing.
enabled
Default: false
ttl_seconds
Default: 604800
async_writers
Defines the number of server threads dedicated to writing in the log. More than one server thread might degrade performance. Default: 1
solr_slow_sub_query_log_options 
See Collecting slow Solr queries.
enabled
Default: false
ttl_seconds 
Default: 604800
async_writers
Defines the number of server threads dedicated to writing in the log. More than one server thread might degrade performance. Default: 1
threshold_ms
Default: 100
solr_update_handler_metrics_options 
See Collecting handler statistics.
enabled 
Default: false
ttl_seconds
Default: 604800
refresh_rate_ms
Default: 60000
solr_index_stats_options 
See Collecting index statistics.
enabled
Default: false
ttl_seconds
Default: 604800
refresh_rate_ms 
Default: 60000
solr_cache_stats_options 
See Collecting cache statistics.
enabled 
Default: false
ttl_seconds
Default: 604800
refresh_rate_ms
Default: 60000
solr_latency_snapshot_options 
See Collecting Solr performance statistics.
enabled
Default: false
ttl_seconds
Default: 604800
refresh_rate_ms
Default: 60000

DSE Analytics Hive options 

hive_options 
Retries setting when Hive inserts data to Cassandra table.
insert_max_retries
Maximum number of retries. Default: 6
insert_retry_sleep_period
Period of time in milliseconds between retries. Default: 50

Hive meta store 

hive_meta_store_enabled 
Enables or disables the Hive meta store via Cassandra. Default: true

DSE Analytics options 

Configure Spark memory, Spark encryption, and integrated Hadoop options.
initial_spark_worker_resources: 0.7
spark_shared_secret_bit_length: 256
spark_security_enabled: false
spark_security_encryption_enabled: false

spark_daemon_readiness_assertion_interval: 1000

hadoop_options:
    task_tracker_cores: 2
    task_tracker_memory: 4g

spark_encryption_options:
    enabled: false
    keystore: .keystore
    keystore_password: cassandra
    key_password: cassandra
    truststore: .truststore
    truststore_password: cassandra
initial_spark_worker_resources 
DataStax Enterprise can control the memory and cores offered by particular Spark Workers in semi-automatic fashion. Specify the fraction of system resources that are made available to the Spark Worker.
The available resources are calculated in the following way:
  • Spark Worker memory = initial_spark_worker_resources * (total system memory - memory assigned to Cassandra)
  • Spark Worker cores = initial_spark_worker_resources * total system cores
The lowest values that you can assign to Spark Worker memory is 64 MB. The lowest value that you can assign to Spark Worker cores is 1 core. If the results are lower, no exception is thrown and the values are automatically limited. The range of the initial_spark_worker_resources value is 0.01 to 1. If the range is not specified, the default value 0.7 is used.

This mechanism is used by default to set the Spark Worker memory and cores. To override the default, uncomment and edit one or both SPARK_WORKER_MEMORY and SPARK_WORKER_CORES options in the spark-env.sh file.

spark_shared_secret_bit_length 
The length of a shared secret used to authenticate Spark components and encrypt the connections between them. This value is not the strength of the cipher for encrypting connections. Default: 256
spark_security_enabled 
Enables Spark security based on shared secret infrastructure. Enables mutual authentication of the Spark components and optional encryption of communication channels except the web UI. Default: false
spark_security_encryption_enabled 
Enables encryption of Spark connections except the web UI. Uses DIGEST-MD5 SASL-based encryption mechanism. Requires spark_security_enabled: true.
hadoop_options 
Settings to configure the amount of resources that are designated for Hadoop tasks.
task_tracker_cores 
The maximum number of slots that can be allocated by the Task Tracker for running user tasks. By default, this value is calculated automatically. Specify the maximum total number of mappers and reducers that can be simultaneously run by the Task Tracker. The individual number of mappers or reducers will never be greater than the number of physical cores -1. Default: 2
task_tracker_memory 
The maximum amount of memory that can be allocated by the Task Tracker for running user tasks. By default, this value is calculated automatically. Specify the total memory to split among particular slots, including the 128m per single slot Java overhead. The maximum heap size of a single mapper or reducer will be no less than the hard-coded minimum 256m. Specify suffix to indicate memory sizes: kilobyte (k), megabyte (m), gigabyte (g), and so on. Default: 4g
spark_daemon_readiness_assertion_interval 
Time interval, in milliseconds, between subsequent retries by the Spark plugin for Spark Master and Worker readiness to start. Default: 1000
spark_encryption_options 
Spark encryption can be enabled for Spark client-to-Spark cluster and Spark internode communication. Spark encryption applies only to these communication protocols in Spark:
  • Control messages via Akka
  • File sharing with HTTP or HTTPS
Spark encryption does not apply to RDD data exchange or to Spark web UI. Encryption is used to send all configuration settings and all files which are required by Spark applications, including passwords and tokens. Spark encryption requires truststores to be defined.
enabled
Enable, or disable, Spark encryption for Spark client-to-Spark cluster and Spark internode communication. Default: false
keystore
The keystore for Spark encryption keys. The relative file path is the base Spark configuration directory that is defined by the SPARK_CONF_DIR environment variable. The default Spark configuration directory is resources/spark/conf. Default: .keystore
keystore_password
The password to access the key store. Default: cassandra
key_password
Default: cassandra
truststore
The truststore for Spark encryption keys. The relative file path is the base Spark configuration directory that is defined by the SPARK_CONF_DIR environment variable. The default Spark configuration directory is resources/spark/conf.
truststore_password
The password to access the truststore. Default: cassandra
protocol
Defines the encryption protocol. Default: TLS
cipher_suites
Defines the cipher suites for Spark encryption. Default: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA]

DSE File System (DSEFS) options 

Properties to enable and configure the DSE file system (DSEFS).
dsefs_options:
    enabled: false
    keyspace_name: dsefs
    work_dir: /var/lib/dsefs
    public_port: 5598
    private_port: 5599
    data_directories:
      - dir: /var/lib/dsefs/data
        storage_weight: 1.0
        min_free_space: 5368709120
# Advanced properties for DSEFS
# service_startup_timeout_ms: 30000
# service_close_timeout_ms: 600000
# server_close_timeout_ms: 600000
# gossip_options:
  #   round_delay_ms: 5000
  #   startup_delay_ms: 5000
  #   shutdown_delay_ms: 30000
# rest_options:
  #   request_timeout_ms: 330000
  #   connection_open_timeout_ms: 55000
  #   client_close_timeout_ms: 60000
  #   server_request_timeout_ms: 300000
  #   idle_connection_timeout_ms: 0
# transaction_options:
  #   transaction_timeout_ms: 3000
  #   conflict_retry_delay_ms: 200
  #   execution_retry_delay_ms: 1000
  #   execution_retry_count: 3
dsefs_options 
DSE File System (DSEFS) options.
enabled
Enable, or disable, DSE File System. Default: false
keyspace_name 
The keyspace where the DSEFS metadata is stored. You can optionally configure multiple DSEFS file systems within a single datacenter by specifying different keyspace names for each cluster. Default: dsefs
work_dir 
The local directory for storing the local node metadata, including the node identifier. The volume of data stored in this directory is nominal and does not require configuration for throughput, latency, or capacity. This directory must not be shared by DSEFS nodes.
public_port 
The public port on which DSEFS listens for clients. DataStax recommends that all nodes in the cluster have the same value. Firewalls must open this port to trusted clients. The service on this port is bound to the RPC address. Default: 5598
private_port 
The private port for DSEFS inter-node communication. Do not open this port to firewalls; this private port must be not visible from outside of the cluster. Default: 5599
data_directories 
One or more data locations where the DSEFS data is stored.
- dir 
Mandatory attribute to identify the set of directories. DataStax recommends segregating these data directories on physical devices different than the devices that are used for Cassandra. Using multiple directories on JBOD improves performance and capacity. Default: /var/lib/dsefs
storage_weight 
The weighting factor for this location specifies how much data to place in this directory, relative to other directories in the cluster. This soft constraint determines how DSEFS distributes the data. For example, a directory with a value of 3.0 receives about three times more data than a directory with a value of 1.0. Default: 1.0
min_free_space 
The reserved space, in bytes, to not use for storing file data blocks. You can use a unit of measure suffix to specify other size units. For example: terabyte (1tb), gigabyte (10g), and megabyte (5000mb). Default: 5368709120
Advanced properties for DSEFS
service_startup_timeout_ms 
Wait time, in milliseconds, before the DSEFS server times out while waiting for services to bootstrap. Default: 30000
service_close_timeout_ms
Wait time, in milliseconds, before the DSEFS server times out while waiting for services to close. Default: 600000
server_close_timeout_ms
Wait time, in milliseconds, that the DSEFS server waits during shutdown before closing all pending connections.
gossip options
Options to configure DSEFS gossip rounds.
round_delay_ms
The delay, in milliseconds, between gossip rounds. Default: 5000
startup_delay_ms
The delay time, in milliseconds, between registering the location and reading back all other locations from Cassandra. Default: 5000
shutdown_delay_ms
The delay time, in milliseconds, between announcing shutdown and shutting down the node. Default: 30000
rest_options 
Options to configure DSEFS rest times.
request_timeout_ms 
The time, in milliseconds, that the client waits for a response that corresponds to a given request. Default: 330000
 
The time, in milliseconds, that the client waits to establish a new connection. Default: 55000
client_close_timeout_ms 
The time, in milliseconds, that the client waits for pending transfer to complete before closing a connection. Default: 60000
server_request_timeout_ms 
The time, in milliseconds, to wait for the server rest call to complete. Default: 300000
idle_connection_timeout_ms 
The time, in milliseconds, to wait before closing an idle connection. Commenting out or a value of 0 disables the idle connection timeout. Default: commented out (0=disabled)
transaction_options 
Options to configure DSEFS transaction times.
transaction_timeout_ms 
Transaction run time, in milliseconds, before the transaction is considered for timeout and rollback. Default: 3000
conflict_retry_delay_ms 
Wait time, in milliseconds, before retrying a transaction that was ended due to a conflict. Default: 200
execution_retry_delay_ms 
Wait time, in milliseconds, before retrying a failed transaction payload execution. Default: 1000
execution_retry_count 
The number of payload execution retries before signaling the error to the application. Default: 3

Spark Performance Service options 

spark_cluster_info_options:
    enabled: false
    refresh_rate_ms: 10000
spark_application_info_options:
    enabled: false
    refresh_rate_ms: 10000
    driver:
        sink: false
        connectorSource: false
        jvmSource: false
        stateSource: false
    executor:
        sink: false
        connectorSource: false
        jvmSource: false
spark_cluster_info_options 
See Monitoring Spark with Spark Performance Objects.
enabled
Default: false
refresh_rate_ms
Default: 10000
spark_application_info_options
Statistics options.
enabled
Default: false
refresh_rate_ms
Default: 10000 milliseconds
driver
The driver option controls the metrics that are collected by the Spark Driver.

Audit logging options 

audit_logging_options:
  enabled: false
  logger: SLF4JAuditWriter
  retention_time: 0
audit_logging_options 
To get the maximum information from data auditing, turn on data auditing on every node. See Enabling data auditing in DataStax Enterprise and Configuring audit logging to a logback log file.
enabled
Default: false
logger 
Default: SLF4JAuditWriterfalse
  • SLF4JAuditWriter - Logs audit information to the SLF4JAuditWriter logger. Audit logging configuration settings are in the logback.xml file.
    The location of the logback.xml file depends on the type of installation:
    Installer-Services and Package installations /etc/dse/cassandra/logback.xml
    Installer-No Services and Tarball installations install_location/resources/cassandra/conf/logback.xml
  • CassandraAuditWriter - Logs audit information to the dse_audit.audit_log Cassandra table. This logger can be run synchronously or asynchronously. See related cassandra_audit_writer_options configuration entries and Configuring audit logging to a Cassandra table.
included_categories or excluded_categories 
The default is to include all categories. Specify either included or excluded categories. Specifying both is an error.

Comma separated list of audit event categories to include or exclude from the audit log. Categories are: QUERY, DML, DDL, DCL, AUTH, ERROR.

included_categories: comma_separated_list
or
excluded_categories: comma_separated_list
included_keyspaces or excluded_keyspaces
The default is to include all keyspaces. Specify either included or excluded keyspaces. Specifying both is an error.

Use a regular expression to filter keyspaces, or use a comma separated list of keyspaces to be included or excluded from the audit log.

included_keyspaces: comma_separated_list
or
excluded_keyspaces: comma_separated_list
retention_time 
The amount of time, in hours, that audit events are retained by supporting loggers. Only the CassandraAuditWriter supports retention time. Values of 0 or less retain events forever. Default: 0
cassandra_audit_writer_options 
Logging to a Cassandra table can provide a more centralized auditing view. Configuration options for the CassandraAuditWriter.
cassandra_audit_writer_options:
    mode: sync
    batch_size: 50
    flush_time: 500
    num_writers: 10
    queue_size: 10000
    write_consistency: QUORUM
    dropped_event_log: /var/log/cassandra/dropped_audit_events.log
mode 
Sets the mode the writer runs in. Default: sync
  • sync - A query is not executed until the audit event is successfully written.
  • async - Audit events are queued for writing to the audit table, but are not necessarily logged before the query executes. A pool of writer threads consumes the audit events from the queue, and writes them to the audit table in batch queries. While this substantially improves performance under load, if there is a failure between when a query is executed, and its audit event is written to the table, the audit table might be missing entries for queries that were executed.
batch_size 
Available only when mode: async.

Must be greater than 0. The maximum number of events the writer dequeues before writing them out to the table. If warnings in the logs reveal that batches are too large, decrease this value or increase the value of batch_size_warn_threshold_in_kb in cassandra.yaml. Default: 50

flush_time 
Available only when mode: async.

The maximum amount of time in milliseconds before an event is removed from the queue by a writer before being written out. This flush time prevents events from waiting too long before being written to the table when there are not a lot of queries happening. Default: 500

num_writers 
Available only when mode: async.

The number of worker threads asynchronously logging events to the CassandraAuditWriter. Default: 10

queue_size 
The size of the queue feeding the asynchronous audit log writer threads. When there are more events being produced than the writers can write out, the queue fills up, and newer queries are blocked until there is space on the queue. If a value of 0 is used, the queue size is unbounded, which can lead to resource exhaustion under heavy query load. Default: 10000
write_consistency 
The consistency level that is used to write audit events. Default: QUORUM
dropped_event_log 
The directory to store the log file that reports dropped events. Default: /var/log/cassandra/dropped_audit_events.log
day_partition_millis 
To spread audit log information across multiple nodes, specify the interval, in milliseconds, between changing nodes. For example, specify 43200000 milliseconds to change the target node every 12 hours. Default: 3600000 (1 hour)

DSE Tiered Storage options 

Options to define one or more disk configurations for DSE Tiered Storage. Specify multiple disk configurations as unnamed tiers by a collection of paths that are defined in priority order, with the fastest storage media in the top tier. With heterogeneous storage configurations across the cluster, specify each disk configuration with config_name:config_settings, and in CREATE or ALTER table statements.
tiered_storage_options:
  strategy1:
    tiers:
      - paths:
          - /mnt1
          - /mnt2
      - paths:
          - /mnt3
          - /mnt4
      - paths:
          - /mnt5
          - /mnt6
tiered_storage_options
Options to configure the smart movement of data across different types of storage media so that data is matched to the most suitable drive type, according to the performance and cost characteristics it requires
strategy1
The first disk configuration strategy. Create a strategy2, strategy3, and so on. In this example, strategy1 is the configurable name of the tiered storage configuration strategy.
tiers
The section that defines a storage tier with the paths and file paths that define the priority order.
- paths
The section of file paths that define the data directories for this tier of the disk configuration. The tier that is listed first is the top tier that typically accesses the fastest storage media. These paths are used only to store data that is configured to use tiered storage. These paths are independent of any settings in the cassandra.yaml file.
- /filepath
Specific file paths to define the data directories for this tier of the disk configuration.

DSE Advanced Replication configuration settings 

DSE Advanced Replication configuration options to replicate data from remote clusters to central data hubs.
#advanced_replication_options:
  enabled: false
  conf_driver_password_encryption_enabled: false
  security_base_path: /base/path/to/advrep/security/files/
advanced_replication_options 
Options to enable DSE Advanced Replication.
enabled
Set enabled:true on an edge node to collect data in the replication log. Default: false.
conf_driver_password_encryption_enabled 
Enable or disable encryption of driver passwords. When enabled, the stored driver password is expected to be encrypted with the system key. After you create the system key, you must copy the same system key to every node in the cluster.
security_base_path 
The base path to prepend to paths in the Advanced Replication configuration locations, including locations to SSL keystore, SSL truststore, and so on. Default: /base/path/to/advrep/security/files/

Inter-node messaging options 

Configuration for the internal messaging service used by several components of DataStax Enterprise. For 5.0 and later, all internode messaging requests use this service.
internode_messaging_options:
  port: 8609
  # frame_length_in_mb: 256
  # server_acceptor_threads: 8
  # server_worker_threads: 16
  # client_max_connections: 100
  # client_worker_threads: 16
  # handshake_timeout_seconds: 10
internode_messaging_options 
Configuration options for inter-node messaging.
port 
The mandatory port for the inter-node messaging service. Default: 8609
frame_length_in_mb 
Maximum message frame length. Default: 256
server_acceptor_threads 
The number of server acceptor threads. Default: the number of available processors.
server_worker_threads 
The number of server worker threads. Default: the number of available processors * 8.
client_max_connections 
The maximum number of client connections. Default: 100
client_worker_threads 
The number of client worker threads. Default: the number of available processors * 8.
handshake_timeout_seconds 
Timeout for communication handshake process. Default: 10

DSE Multi-Instance server_id 

server_id 
In DSE Multi-Instance /etc/dse-nodeId/dse.yaml files, the server_id option is generated to uniquely identify the physical server on which multiple instances are running. The server_id default value is the media access control address (MAC address) of the physical server. You can change server_id when the MAC address is not unique, such as a virtualized server where the host’s physical MAC is cloned.

DSE Graph system-level options 

These graph options are system-level configuration options and options that are shared between graph instances. Add an option if it is not present in the provided dse.yaml file.
graph:
  adjacency_cache_clean_rate: 1024
  adjacency_cache_max_entry_size_in_mb: 0
  adjacency_cache_size_in_mb: 128
  analytic_evaluation_timeout_in_minutes: 10080
  gremlin_server_enabled: true
  index_cache_clean_rate: 1024
  index_cache_max_entry_size_in_mb: 0
  index_cache_size_in_mb: 128
  max_query_queue: 10000
  #max_query_threads:
  realtime_evaluation_timeout_in_seconds: 30
  schema_agreement_timeout_in_ms: 10000
  schema_mode: Production
  system_evaluation_timeout_in_seconds: 180
  window_size: 100000
  max_query_params: 256
graph 
These graph options are system-level configuration options and options that are shared between graph instances.
adjacency_cache_clean_rate 
The number of stale rows per second to clean from each graph's adjacency cache. Default: 1024.
adjacency_cache_max_entry_size_in_mb 
The maximum entry size in each graph's adjacency cache. When set to zero, the default is calculated based on the cache size and the number of CPUs. Entries that exceed this size are quietly dropped by the cache without producing an explicit error or log message. Default: 0.
adjacency_cache_size_in_mb 
The amount of RAM to allocate to each graph's adjacency (edge and property) cache. Default: 128.
analytic_evaluation_timeout_in_minutes 
Maximum time to wait for an analytic (Spark) traversal to evaluate. Default: 10080 (7 days).

Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.

gremlin_server_enabled 
Enables or disables Gremlin Server. Default: true.
index_cache_clean_rate 
The number of stale entries per second to clean from the adjacency cache. Default: 1024.
index_cache_max_entry_size_in_mb 
The maximum entry size in the index adjacency cache. When set to zero, the default is based on the cache size and the number of CPUs. Value: integer. + # default is calculated based on the cache size and the number of CPUs. Entries that exceed this size are quietly dropped by the cache without producing an explicit error or log message. Default: 0.
index_cache_size_in_mb 
The amount of ram to allocate to the index cache. Default: 128.
max_query_queue 
The maximum number of CQL queries that can be queued as a result of Gremlin requests. Incoming queries are rejected if the queue size exceeds this setting. Default: 10000.
max_query_threads 
The maximum number of threads to use for queries to the database. When this option is not set, the default is:
  • If gremlinPool is present and nonzero:

    10 * the gremlinPool setting

  • If gremlinPool is not present in this file or set to zero:

    The number of available CPU cores

See gremlinPool.
realtime_evaluation_timeout_in_seconds 
Maximum time to wait for a real-time traversal to evaluate. Default: 30 seconds.

Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.

schema_agreement_timeout_in_ms 
Maximum time to wait for cassandra to agree on schema versions before timing out. Default: 10000

Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.

schema_mode 
Controls the way that the schemas are handled. Valid values:
  • Production = Schema must be created before data insertion. Schema cannot be changed after data is inserted. Full graph scans are disallowed unless the option graph.allow_scan is changed to TRUE.
  • Development = No schema is required to write data to a graph. Schema can be changed after data is inserted. Full graph scans are allowed unless the option graph.allow_scan is changed to FALSE.
system_evaluation_timeout_in_seconds 
Maximum time to wait for a graph-system request to execute. For example, a graph-system request like creating a new graph. Default: 180 (3 minutes).

Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.

window_size 
The number of samples to keep when aggregating log events. Only a small subset of graph log events use this system. Modifying this setting is rarely necessary or helpful. Default: 100000.
max_query_params 
The maximum number of parameters that can be passed on a graph query request for TinkerPop drivers and drivers using Cassandra native protocol. Passing very large numbers of parameters on requests is an anti-pattern, because the script evaluation time increases proportionally. DataStax recommends reducing the number of parameters to speed up script compilation times. Before you increase this value, consider alternate methods for parameterizing scripts, like passing a single map. If the graph query request requires many arguments, pass a list. Default: 256

DSE Graph id assignment and partitioning strategy options 

ids:
    block_renew: 0.8
    community_reuse: 28
    consistency_mode: GLOBAL
    # datacenter_id: integer unique per DC when consistency_mode: DC_LOCAL
    id_hash_modulus: 20
    member_block_size: 512
ids 
DSE Graph configuration options for standard vertex ID assignment and partitioning strategies.
block_renew 
The graph standard vertex ID allocator operates on blocks of contiguous IDs. Each block is allocated using a database lightweight transaction that requires coordination latency. To hide the cost of allocating a standard ID block, the allocator begins asynchronously buffering a replacement block whenever a current block is nearly empty. This block_renew parameter defines "nearly empty" as a floating point number between 0 and 1. The value is how much of a standard ID block can be used before graph starts asynchronously allocating its replacement. This setting has no effect on custom IDs. Value must be between 0 and 1. Default: 0.8.
community_reuse 
For graphs using standard vertex IDs, if a transaction creates multiple vertices, the allocator attempts to assign vertex IDs that colocate vertices on the same database replicas. If an especially large vertex cohort is created, the allocator chunks the vertex creation and assigns a random target location to avoid load hotspotting. This setting controls the vertex chunk size and has no effect on custom IDs. Default: 28.
consistency_mode 
Must be set to DC_LOCAL or GLOBAL.
  • DC_LOCAL - The node uses LOCAL_QUORUM when allocating an ID for a graph vertex. The datacenter_id option must be correctly configured on every node in the cluster.
  • GLOBAL - (Default) The node uses QUORUM when allocating an ID for a graph vertex. The datacenter_id option is ignored.
This option must have the same value on every node in the cluster. Its value can only be changed when the entire cluster is stopped. This setting has no effect on custom IDs.
datacenter_id 
Applies only when consistency_mode is DC_LOCAL. Set to an arbitrary value between 1 and 127, inclusive. This setting has no effect on custom IDs.
Warning: Each datacenter in the cluster must have a unique datacenter_id. Violating this constraint will corrupt the graph database without warning.
This setting has no effect on custom IDs. Default: no explicit default value.
id_hash_modulus 
An integer between 1 and 2^24 (both inclusive) that affects maximum ID capacity and the maximum storage space used by ID allocations. Lower values reduce the storage space consumed and the lightweight transaction overhead imposed at startup. Lower values also reduce the total number of IDs that can be allocated over the life of a graph, because this parameter is proportional to the allocatable ID space. However, the proportion coefficient is Long.MAX_VALUE (2^63-1), so ID headroom should be sufficient, practically speaking, even if this is set to 1. This setting has no effect on custom IDs. Default: 20.
member_block_size 
The graph standard vertex ID allocator claims uniformly-sized blocks of contiguous IDs using lightweight transactions on the database. This setting controls the size of each block. This setting has no effect on custom IDs. Default: 512.

DSE Graph listener options 

listener:
  listener_name: string   
  black_types:  # This list is empty by default   
  interval_in_seconds: 3600
  type: slf4j
  white_types:  # This list is empty by default
listener 
Options that contain all registered state listeners identified by their name.
listener_name 
Replace listener_name with a string that identifies the listener. The string must begin with a lower case letter and can be composed of lowercase letters, numbers, and underscores.
*.black_types 
The names of state types that are ignored. All state types but those given are listened to. Default: (empty).
*.interval_in_seconds 
The interval in which the state values are logged. Default: 3600

Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.

*.type 
The type of the state listener. Must be one of the following values: slf4j. Default: slf4j.
*.white_types 
The names of state types that should be listened. Only those state types are listened to and all others ignored. Default: (empty).

DSE Graph messaging options 

msg:
  graph_msg_timeout_in_ms: 5000
msg 
Options to configure DSE Graph internal query and lightweight messaging system.
graph_msg_timeout_in_ms 
Graph messages must be acknowledged within this interval, or else the message is assumed dropped/failed. Graph retries the message or fails the responsible request if the retry limit is exceeded. Default: 5000

Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.

DSE Graph event observers options 

observer:
  observer_name: string
  black_types:  # This list is empty by default  
  observed_graphs:  # This list is empty by default
  slow_threshold_in_ms: 300000
  type: slf4j
  white_types:  # This list is empty by default
observer 
Options to configure all registered event observers identified by their name.
observer_name 
Replace observer_name with a string that identifies the event observers. This string is the names of event types that are ignored. All event types but those given are observed. The string must begin with a lower case letter and can be composed of lowercase letters, numbers, and underscores. Value: YAML-formatted list of strings.
*.black_types 
The names of event types that are ignored. All event types but those given are observed. Value: YAML-formatted list of strings. Default: (empty).
observed_graphs 
The names of the graphs for which events are observed. Value: YAML-formatted list of strings. Default: (empty).
*.slow_tx_graphs 
The names of the graphs for which slow transactions are monitored. Default: (empty).
*.slow_threshold_in_ms 
Threshold at which slow queries get reported. Default: 300000

Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.

*.type 
The type of the event observer. Must be one of the following values: slf4j, slow_request. Default: slf4j.
*.white_types 
The names of event types that should be observed. Only those event types are observed and all others ignored. Value: YAML-formatted list of strings. Default: (empty).

DSE Graph shared data options 

shared_data:
  refresh_interval_in_ms: 60000
shared_data 
Options for shared data in DSE Graph.
refresh_interval_in_ms 
The interval between refreshes in which the graph schema is reread from the Cassandra tables. Note that schema is also immediately updated when schema changes occur, so this parameter is a failsafe to poll for schema changes periodically. Default: 60000

Option names and values expressed in ISO 8601 format used in earlier DSE 5.0 releases are still valid. The ISO 8601 format is deprecated.

DSE Graph Gremlin Server options 

The Gremlin Server is configured using Apache TinkerPop specifications.
gremlin_server:
    # port: 8182
    # threadPoolWorker: 2
    # gremlinPool: 0
#        scriptEngines:
#            gremlin-groovy:
#                config:
#                   sandbox_enabled: false
#                   sandbox_rules:
#                        whitelist_packages:
#                            - package.name
#                        whitelist_types:
#                            - fully.qualified.type.name
#                        whitelist_supers:
#                            - fully.qualified.class.name
#                        blacklist_packages:
#                            - package.name
#                        blacklist_supers:
#                            - fully.qualified.class.name
gremlin_server 
The top-level configurations in Gremlin Server.
port 
The port value identifies the available communications port for Gremlin Server. Default: 8182
threadPoolWorker 
The number of worker threads that handle requests and responses on the Gremlin Server channel, including routing requests to the right server operations, handling scheduled jobs on the server, and writing serialized responses back to the client. Default: 2
gremlinPool 
The number of Gremlin threads available to execute actual scripts in a ScriptEngine. This pool represents the workers available to handle blocking operations in Gremlin Server. Default: 8
scriptEngines 
Section to configure gremlin server scripts.
gremlin-groovy 
Section for gremlin-groovy scripts.
sandbox_enabled 
Sandbox is enabled by default. To disable the gremlin groovy sandbox entirely, set to false.
sandbox_rules 
Section for sandbox rules.
whitelist_packages 
List of packages, one package per line, to whitelist.
-package.name 
Retain the hyphen before the fully qualified package name.
whitelist_types
List of types, one type per line, to whitelist.
-fully.qualified.type.name
Retain the hyphen before the fully qualified type name.
whitelist_supers 
List of super classes, one class per line, to whitelist. Retain the hyphen before the fully qualified class name.
-fully.qualified.class.name 
Retain the hyphen before the fully qualified class name.
blacklist_packages
List of packages, one package per line, to blacklist.
-package.name
Retain the hyphen before the fully qualified package name.
blacklist_supers
List of super classes, one class per line, to blacklist. Retain the hyphen before the fully qualified class name.
-fully.qualified.class.name
Retain the hyphen before the fully qualified class name.
See also Configuring the Gremlin console for Gremlin Server in the remote.yaml file.
The location of the cassandra.yaml file depends on the type of installation:
Installer-Services /etc/dse/cassandra/cassandra.yaml
Package installations /etc/dse/cassandra/cassandra.yaml
Installer-No Services install_location/resources/cassandra/conf/cassandra.yaml
Tarball installations install_location/resources/cassandra/conf/cassandra.yaml
The location of the dse.yaml file depends on the type of installation:
Installer-Services /etc/dse/dse.yaml
Package installations /etc/dse/dse.yaml
Installer-No Services install_location/resources/dse/conf/dse.yaml
Tarball installations install_location/resources/dse/conf/dse.yaml