DataStax Enterprise configuration file (dse.yaml)

The configuration file for Kerberos authentication, purging of expired data from the Solr indexes, setting Solr inter-node communication, adjusting disk health intervals, and enabling the Performance Service.

Default dse.yaml locations:
  • Installer-Services and Package installations: /etc/dse/dse.yaml
  • Installer-No Services and Tarball installations: install_location/resources/dse/conf/dse.yaml

Snitch settings 

The delegated_snitch property sets which snitch is delegated. For example, it sets the DseSimpleSnitch.

  • delegated_snitch

    Default: com.datastax.bdp.snitch.DseSimpleSnitch - Sets which snitch is used.

  • DseSimpleSnitch

    The DseSimpleSnitch places Cassandra, Hadoop, and Solr nodes into separate data centers. See DseSimpleSnitch.

For more information, see Snitches in the Cassandra documentation.

Kerberos support 

The kerberos_options set the QOP (Quality of Protection) and encryption options.

kerberos_options:
   keytab: path_to_keytab/dse.keytab
   service_principal: dse_user/_HOST@REALM
   http_principal: HTTP/_HOST@REALM
   qop: auth
  • keytab: resources/dse/conf/dse.keytab

    The keytab file must contain the credentials for both of the fully resolved principal names, which replace _HOST with the FQDN of the host in the service_principal and http_principal settings. The UNIX user running DSE must also have read permissions on the keytab.

  • service_principal: dse_user/_HOST@REALM

    The service_principal that the Cassandra and Hadoop processes run under must use the form dse_user/_HOST@REALM, where dse_user is:

    • Installer-Services and Package installations: cassandra
    • Package installations: the name of the UNIX user that starts the service
    Set REALM to the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase. The service_principal must be consistent everywhere: in the dse.yaml, present in the keytab, and in the cqlshrc file (where service_principal is separated into service/hostname).
  • http_principal: HTTP/_HOST@REALM

    The http_principal is used by the tomcat application container to run DSE Search/Solr. The web server uses GSS-API mechanism (SPNEGO) to negotiate the GSSAPI security mechanism (Kerberos). Set REALM to the name of your Kerberos realm. In the Kerberos principal, REALM must be uppercase.

  • qop - auth

    A comma-delimited list of Quality of Protection values that clients and servers can use for each connection. The valid values are:
    • auth - Default: Authentication only.
    • auth-int - Authentication plus integrity protection for all transmitted data.
    • auth-conf - Authentication plus integrity protection and encryption of all transmitted data.

      Encryption using auth-conf is separate and completely independent of whether encryption is done using SSL. If both auth-conf and SSL are enabled, the transmitted data is encrypted twice. DataStax recommends choosing only one method and using it for both encryption and authentication.

Scheduler settings for Solr indexes 

These settings control the schedulers in charge of querying for and removing expired data.

ttl_index_rebuild_options
  • fix_rate_period - Default: 300 seconds. Schedules how often to check for expired data.
  • initial_delay - Default: 20 seconds. Speeds up start-up by delaying the first TTL checks.
  • max_docs_per_batch - Default: 200. The maximum number of documents deleted per batch by the TTL rebuild thread.

Solr shard transport options 

For inter-node communication between Solr nodes.

  • shard_transport_options
    • type - Default: netty. Starting in 4.5.0 netty is used for TCP-based communication. It provides lower latency, improved throughput, and reduced resource consumption than http transport, which uses standard a HTTP-based interface for communication.
    • netty_server_port - Default: 8984. The TCP listen port. This setting is mandatory if you either want to use the netty transport now or later migrate to it. To use http transport, either comment out this setting or change it to -1.
    • netty_server_acceptor_threads - Default: number of available processors. - The number of server acceptor threads.
    • netty_server_worker_threads - Default: number of available processors * 8. The number of server worker threads.
    • netty_client_worker_thread - Default: number of available processors * 8. The number of client worker threads.
    • netty_client_max_connections - Default: 100. The maximum number of client connections.
    • netty_client_request_timeout - Default: 60000. The client request timeout, in milliseconds.
  • HTTP transport settings

    The defaults for are the same as Solr, that is 0, meaning no timeout at all. To avoid blocking operations, DataStax strongly recommends to changing these settings to a finite value. These settings are valid across Solr cores

    • http_shard_client_conn_timeout - Default: 0. HTTP shard client timeouts in milliseconds.
    • http_shard_client_socket_timeout - Default: 0. HTTP shard client socket timeouts in milliseconds.

Solr indexing 

DSE Search provides multi-threaded indexing implementation to improve performance on multi-core machines. All index updates are internally dispatched to a per-core indexing thread pool and executed asynchronously, which allows for greater concurrency and parallelism. However, index requests can return a response before the indexing operation is executed.

  • max_solr_concurrency_per_core - Default: number of available Solr cores * 2. Configures the maximum number of concurrent asynchronous indexing threads per Solr core. If set to 1, DSE Search returns to the synchronous indexing behavior.
  • back_pressure_threshold_per_core - Default: 500. The total number of queued asynchronous indexing requests per Solr core, computed at Solr commit time. When exceeded, back pressure prevents excessive resources consumption by throttling new incoming requests.
  • flush_max_time_per_core - Default: 5 minutes. The maximum time to wait before flushing asynchronous index updates, which occurs at either at Solr commit time or at Cassandra flush time. To fully synchronize Solr indexes with Cassandra data, ensure that flushing completes successfully by setting this value to a reasonable high value.

DSE Performance Service options 

These settings are used by the Performance Service to configure how it collects performance metrics on Cassandra nodes.