DataStax Agent configuration

Configure DataStax agents with options in the address.yaml file.

The address.yaml configuration file

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations installation_location/resources/cassandra/conf/cassandra.yaml

cluster_name.conf

The location of the cluster_name.conf file depends on the type of installation:
  • Package installations: /etc/opscenter/clusters/cluster_name.conf
  • Tarball installations: install_location/conf/clusters/cluster_name.conf

address.yaml

The location of the address.yaml file depends on the type of installation:
  • Package installations: /var/lib/datastax-agent/conf/address.yaml
  • Tarball installations: install_location/conf/address.yaml

cassandra-env.sh

The location of the cassandra-env.sh file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra-env.sh
Tarball installations installation_location/resources/cassandra/conf/cassandra-env.sh

The address.yaml file contains configuration options for the DataStax Agent.

Most of these properties can be set in the [agent_config] section of cluster_name.conf on the opscenterd machine, which automatically propagates the properties to all agents. Some properties or some cases might require setting these properties directly in address.yaml on applicable agents. When manually installing DataStax Agents, stomp_interface is the only property in most environments that needs to be explicitly configured. When installing DataStax Agents automatically, stomp_interface is configured for you.

For more information about viewing agent status and troubleshooting agent issues, see Agents View.

Configuration options

use_ssl
Whether or not to use SSL communication between the agent and opscenterd. Affects both the STOMP connection and agent HTTP server. Corresponds to [agents].use_ssl in opscenterd.conf. Setting this option to true turns on SSL connections. Example: use_ssl: true
stomp_port
The stomp_port used by opscenterd. Example: stomp_port: 61620
stomp_interface
Reachable IP address of the opscenterd machine. The connection made will be on stomp_port. Example: stomp_interface: 127.0.0.1
local_interface
The IP used to identify the node. If broadcast_address is set in cassandra.yaml, this should be the same as that; otherwise, it is typically the same as listen_address in cassandra.yaml. A good check is to confirm that this address is the same as the address that nodetool ring outputs. Example: local_interface: 172.10.0.2
agent_rpc_interface
The IP that the agent HTTP server listens on. In a multiple region deployment, this is typically a private IP. Default: Matches rpc_interface (native_transport_interface) from cassandra.yaml. Example: agent_rpc_interface: 172.10.0.2
agent_rpc_broadcast_address
The IP that the central OpsCenter process uses to connect to the DataStax agent. Default: First available resolvable address in this order: broadcast_rpc_address (native_transport_broadcast_address), rpc_address (native_transport_address), and listen_address from cassandra.yaml. Example: agent_rpc_broadcast_address: 172.10.0.2
swagger_enabled
Enables or disables the swagger UI for the agent API. Example: swagger_enabled: true
opscenter_ssl_keystore
On target nodes where DataStax Agents are running, the path to the SSL keystore file that the Agents use to connect to opscenterd. Example: opscenter_ssl_keystore: /usr/share/opscenter/ssl/agentKeystore
opscenter_ssl_keystore_password
The SSL keystore password that the agents use to connect to opscenterd. Example: opscenter_ssl_keystore_password: keystore-pass [This field may be encrypted for additional security.]
opscenter_ssl_truststore
The path to the truststore file that the agents use to connect to opscenterd. Example: opscenter_ssl_truststore: /usr/share/opscenter/ssl/trustStore
opscenter_ssl_truststore_password
The SSL truststore password that the agents use to connect to opscenterd. Default: Uses the keystore password if an SSL truststore password is not specified. Example: opscenter_ssl_truststore_password: trust-pass [This field may be encrypted for additional security.]
opscenter_ssl_strict_subject_validation
Instructs the agent to reject certificates from opscenterd when the certificate subject does not match the server's ip. This option is false by default, which means the agent attempts subject validation first. If that fails, the agent logs a warning and retries the connection without subject validation. In a later version of OpsCenter, the default will change to true. Example: opscenter_ssl_strict_subject_validation: true
poll_period
The length of time, specified in seconds, between attempts to poll metrics. Example: poll_period: 60
disk_usage_update_period
The length of time, in seconds, to wait between attempts to poll the disk for usage. Example: disk_usage_update_period: 60
rollup_state_ttl
The time-to-live (TTL) for data points stored in rollup states before data is collected in the rollup60 table. Default: (* 86400 3) Example: rollup_state_ttl: 259200
rollups60_ttl
Sets time-to-live (TTL) for rollups60 in seconds. A value of '0' prevents the data from expiring. Setting '-1' disables this rollup and prevents storing any data for it which might help reduce used disk space. Default: 604800 Example: rollups60_ttl: 604800
rollups300_ttl
Sets time-to-live (TTL) for rollups300 in seconds. A value of '0' prevents the data from expiring. Setting '-1' disables this rollup and prevents storing any data for it which might help reduce used disk space. Default: 604800 Example: rollups300_ttl: 2419200
rollups7200_ttl
Sets time-to-live (TTL) for rollups7200 in seconds. A value of '0' prevents the data from expiring. Setting '-1' disables this rollup and prevents storing any data for it which might help reduce used disk space. Default: 31536000 Example: rollups7200_ttl: 31536000
rollups86400_ttl
Sets time-to-live (TTL) for rollups86400 in seconds. A value of '0' prevents the data from expiring. Setting '-1' disables this rollup and prevents storing any data for it which might help reduce used disk space. Default: 0 Example: rollups86400_ttl: 0
rollup_rate
Maximum number of metrics that can be saved to Cassandra over the [rollup_rate_unit] period of time.this should be at least ([#tables] * 40) + 200 per minDefault: 200 (so 200/sec with default rollup_rate_unit) Example: rollup_rate: 200
rollup_rate_unit
Unit of time for rollup_rate. Choose from microsecond, millisecond, second, minute, hour, day, or month. Default: second Example: rollup_rate_unit: second
bypass_dse_metrics_storage
Enable or disable storing metrics in a separate storage DataStax Enterprise (DSE) cluster. Metrics are stored in an OpsCenter keyspace on the same DSE cluster being monitored by default. Note: storing metrics on a separate DSE cluster will place the entire OpsCenter keyspace on that cluster. Default: false. Example: bypass_dse_metrics_storage: false
jmx_host
Host used to connect to local JMX server. The default setting is localhost. This information will be sent by opscenterd for convenience, but can be configured locally as needed. Example: jmx_host: 127.0.0.1
jmx_port
Port used to connect to local JMX server. The default setting is 7199. This information will be sent by opscenterd for convenience, but can be configured locally as needed. Example: jmx_port: 7199
jmx_user
The username used to connect to the local JMX server. Example: jmx_user: jmx-username
jmx_pass
The password used to connect to the local JMX server. Example: jmx_pass: jmx-password [This field may be encrypted for additional security.]
jmx_queue_poll_timeout
The number of seconds to wait for an available JMX connection before timing out. Default: 10. Example: jmx_queue_poll_timeout: 10
status_reporting_interval
The length of time, in seconds, between sending agent health information. Example: status_reporting_interval: 20
disk_wait
The amount of time in milliseconds to wait for disk operations when collecting metrics. Default: 5000. Example: disk_wait: 5000
ec2_metadata_api_host
The ec2 metadata api host, used to determine information about this node, if it is on ec2. Example: ec2_metadata_api_host: 169.254.169.254
metrics_enabled
Whether or not to collect and store metrics for the local node. Setting this option to false turns off metrics collection. Default: true. Example: metrics_enabled: true
jmx_metrics_threadpool_size
The size of the threadpool used for collecting metrics over JMX. Example: jmx_metrics_threadpool_size: 6
metrics_ignored_keyspaces
A comma-separated list of keyspaces ignored by metrics collection. Example: metrics_ignored_keyspaces: ks1, ks2, ks3
metrics_ignored_column_families
A comma-separated list of tables (formerly referred to as column families) ignored by metrics collection. Example: metrics_ignored_column_families: ks1.cf1, ks1.cf2, ks2.cf1
metrics_ignored_solr_cores
A comma separated list of solr cores that will be ignored by metric collection. Example: metrics_ignored_solr_cores: ks1.cf1, ks1.cf2, ks2.cf1
hosts
The DataStax Enterprise node or nodes responsible for storing OpsCenter data. By default, this will be the local node, but may be configured to store data on a separate cluster. The hosts option accepts an array of strings specifying the IP addresses of the node or nodes. For example, ["1.2.3.4"] or ["1.2.3.4", "1.2.3.5"]. Example: hosts: ["127.0.0.1"]
cassandra_port
Port used to connect to the storage cassandra node. The native transport port. Example: cassandra_port: 9042
cassandra_user
The Username used to connect to storage cassandra when authentication is enabled. Example: cassandra_user: cassandra
cassandra_pass
The password used to connect to storage cassandra when authentication is enabled. Example: cassandra_pass: cassandra [This field may be encrypted for additional security.]
max_reconnect_time
The maximum time in ms that the agent will wait between cassandra reconnect attempts. Example: max_reconnect_time: 15000
max_pending_repairs
The maximum number of repairs that may be pending, exceeding this number blocks new repairs. Example: max_pending_repairs: 5
ssl_keystore
The SSL keystore location for the storage cluster that agents use to connect to CQL. Example: ssl_keystore: /etc/dse/conf/.keystore
ssl_keystore_password
The SSL keystore password for the storage cluster that agents use to connect to CQL. Example: ssl_keystore_password: keystore-pass [This field may be encrypted for additional security.]
ssl_truststore
The SSL truststore location for the storage cluster that agents use to connect to CQL. Example: ssl_truststore: /etc/dse/conf/.truststore
ssl_truststore_password
The SSL truststore password for the storage cluster that agents use to connect to CQL. Example: ssl_truststore_password: truststore-pass [This field may be encrypted for additional security.]
monitored_cassandra_port
Port used to connect to the monitored cassandra node. The native transport port. Example: monitored_cassandra_port: 9042
monitored_cassandra_user
The Username used to connect to monitored cassandra when authentication is enabled. Example: monitored_cassandra_user: cassandra
monitored_cassandra_pass
The password used to connect to monitored cassandra when authentication is enabled. Example: monitored_cassandra_pass: cassandra-pass [This field may be encrypted for additional security.]
monitored_ssl_keystore
The SSL keystore location for the monitored cluster that agents use to connect to CQL. Example: monitored_ssl_keystore: /etc/dse/conf/.keystore
monitored_ssl_keystore_password
The SSL keystore password for the monitored cluster that agents use to connect to CQL. Example: monitored_ssl_keystore_password: keystore-pass [This field may be encrypted for additional security.]
monitored_ssl_truststore
The SSL truststore location for the monitored cluster that agents use to connect to CQL. Example: monitored_ssl_truststore: /etc/dse/conf/.truststore
monitored_ssl_truststore_password
The SSL truststore password for the monitored cluster that agents use to connect to CQL. Example: monitored_ssl_truststore_password: truststore-pass [This field may be encrypted for additional security.]
kerberos_service
The Kerberos service name to use when using Kerberos authentication within DSE. Example: kerberos_service: cassandra-kerberos
kerberos_keytab_location
The Kerberos keytab location when using Kerberos authentication within DSE. Example: kerberos_keytab_location: /path/to/keytab.keytab
kerberos_client_principal
The Kerberos client principal to use when using Kerberos authentication within DSE. Example: kerberos_client_principal: cassandra@hostname
storage_keyspace
The keyspace that the agent will use to store data. Example: storage_keyspace: OpsCenter
alias
Provides an alias for the agent to use when sending node details to OpsCenter. The alias is useful when the agent is unable to get the localhost name from InetAddress.getLocalHost(). Example: alias: MyNodeOne
storage_dse_connection_timeout
The maximum time in seconds that the agent waits while attempting to connect to the DSE cluster. Default: 30. Example: storage_dse_connection_timeout: 30
storage_dse_host_read_timeout
The maximum time in milliseconds that the agent waits for a storage node to return a response from a read request before considering said node unresponsive. Should be set higher than read_request_timeout_in_ms in cassandra.yaml. Example: storage_dse_host_read_timeout: 10000
monitored_dse_connection_timeout
The maximum time in seconds that the agent waits while attempting to connect to the DSE cluster. Default: 30. Example: monitored_dse_connection_timeout: 30
monitored_dse_host_read_timeout
The maximum time in milliseconds that the agent waits for a monitored node to return a response from a read request before considering said node unresponsive. Should be set higher than read_request_timeout_in_ms in cassandra.yaml. Example: monitored_dse_host_read_timeout: 10000
cassandra_install_location
The base directory where DataStax Enterprise or Cassandra is installed. When not set, the agent attempts to auto-detect the location but cannot do so in all cases. Example: cassandra_install_location: /usr/share/dse
cassandra_log_location
The directory in which DSE logs reside. This is only used for the diagnostics tarball, and should only be set if these logs are in a location other than the default. Example: cassandra_log_location: /var/log/cassandra
cassandra_binary_location
The location of Cassandra's binaries’ directory (cqlsh, nodetool, and sstableloader). When not set, the agent attempts to auto-detect the location. Example: cassandra_binary_location: /usr/bin
cassandra_conf_location
The location of Cassandra's configuration files’ directory (cassandra.yaml, cassandra-env.sh). When not set, the agent attempts to auto-detect the location. Example: cassandra_conf_location: /etc/dse/cassandra
dse_env_location
The location of directory that holds dse-env.sh. When not set, the agent attempts to auto-detect the location. Example: dse_env_location: /etc/dse
dse_binary_location
The location of directory that holds dsetool. When not set, the agent attempts to auto-detect the location. Example: dse_binary_location: /usr/bin
dse_conf_location
The location of directory that holds dse.yaml. When not set, the agent attempts to auto-detect the location. Example: dse_conf_location: /etc/dse
spark_conf_location
The location of directory that holds spark-env.sh. When not set, the agent attempts to auto-detect the location. Example: spark_conf_location: /etc/dse/spark
spark_log_location
The location of directory that holds spark logs. When not set, the agent attempts to auto-detect the location. Example: spark_log_location: /var/log/spark
solr_log_location
The location of directory that holds solr logs. When not set, the agent attempts to auto-detect the location. Example: solr_log_location: /var/log/cassandra
agent_log_location
The path to the OpsCenter agent.log file and additional log files for the DataStax Agent. Example: agent_log_location: nodes/logs/opsagent
cassandra_rpc_interface
When unspecified, the agent will attempt to determine cassandra rpc_address by reading cassandra.yaml for rpc_address (native_transport_address). When specified, this agent lookup is skipped and the specified value is used instead. Example: cassandra_rpc_interface: 172.10.0.2
api_port
The port used for the http api endpoint. Example: api_port: 61621
runs_sudo
Sets whether the DataStax Agent will be run using sudo or not. Setting this option to false means the agent will not use sudo, and the agent user will not run using elevated privileges. Setting this option to true means the agent will run using sudo, and elevated privileges. Default is true. Example: runs_sudo: true
s3_proxy_host
The optional proxy host the client will connect through. Example: s3_proxy_host: localhost
s3_proxy_port
The optional proxy port the client will connect through. Example: s3_proxy_port: 80
restore_req_update_period
The frequency in seconds with which status updates are sent to opscenterd during Restore operations in the Backup Service. Default: 60. Example: restore_req_update_period: 60
backup_staging_dir
The directory where commitlogs are copied after they are written to disk from DSE. The DataStax Agents monitor this directory and move commitlogs to the configured destinations. After all destinations receive the relevant commit logs, the logs are moved to the backup_storage_dir.The default location is /var/lib/datastax-agent/commitlogs/. Example: backup_staging_dir: /var/lib/datastax-agent/commitlogs/
backup_storage_dir
The directory where On Server commitlog backups are stored after being copied to all configured destinations. The directory will be cleaned based on a configured retention policy for an On Server location. The directory should be large enough to hold commitlogs for the length of the retention policy. The default location is /var/lib/datastax-agent/backups/. Example: backup_storage_dir: /var/lib/datastax-agent/backups/
tmp_dir
The directory used to temporarily stage files when restoring. The default location is /var/lib/datastax-agent/tmp. Example: tmp_dir: /var/lib/datastax-agent/tmp/
remote_backup_retries
The number of attempts to make when file download fails during a restore. Default: 3. Example: remote_backup_retries: 3
remote_backup_timeout
The timeout in milliseconds for the connection used to push backups to remote destinations. Default: 1000. Example: remote_backup_timeout: 1000
use_s3_cli
Enable using the AWS CLI instead of the AWS SDK when bulk loading backups to Amazon S3 locations. Default: false. Example: use_s3_cli: true
remote_verify_initial_delay
Initial delay in milliseconds to wait before checking if a file was successfully uploaded during a backup operation. This configuration option works in conjunction with the remote_verify_max option to distinguish between broken versus tardy backups when cleaning up SSTables. The remote_verify_initial_delay value doubles each time a file transfer validation failure occurs until the value exceeds the remote_verify_max value. Default: 1000 (1 second). Example: remote_verify_initial_delay: 1000
remote_verify_max
The maximum time period to wait after a file upload completed but is still unreadable from the remote destination. When this delay is exceeded, the transfer is considered failed. This configuration option works in conjunction with the remote_verify_initial_delay option to distinguish between broken versus tardy backups when cleaning up SSTables. Default: 30000 (30 seconds). Example: remote_verify_max: 300000
restore_on_transfer_failure
When set to true, a failed file transfer from the remote destination will not halt the restore process. process. A future restore attempt uses any successfully transferred files. Default: false. Example: restore_on_transfer_failure: false
remote_backup_region
The AWS region to use for remote backup transfers. Default: us-west-1. Example: remote_backup_region: us-west-1
max_file_transfer_attempts
The maximum number of attempts to upload a file or create a remote destination. Default: 3. Example: max_file_transfer_attempts: 30
sstableloader_max_heap_size
The maximum heap size used by the sstableloader during restore operations. Default: 256M. Example: sstableloader_max_heap_size: 256M
trace_delay
The time in milliseconds to wait between issuing a query to trace and fetching trace events in the Performance Service Slow Query panel. Default: 300. Example: trace_delay: 300
support_shell_timeout
The number of seconds to wait for a shell process such as nodetool to run before timing out. This setting is only used for generating a diagnostic tarball. Default: 30. Example: support_shell_timeout: 30
graphite_host
Setting graphite_host enables the forwarding of metrics to a graphite server at the given address. Leaving the graphite_host blank disables forwarding metrics to the graphite server. Example: graphite_host: graphite.myhost.com
graphite_port
Port for graphite's plaintext protocol. Example: graphite_port: 2003
graphite_prefix
A prefix to insert metrics under. Example: graphite_prefix: opscenter
slow_query_past
How far into the past in milliseconds to look for slow queries. Default: 3600000 (1,000 hours). Example: slow_query_past: 3600000
slow_query_refresh
Time in seconds between slow query refreshes. Default: 5. Example: slow_query_refresh: 5
slow_query_fetch_size
The limit to how many slow queries are fetched. Default: 500. Example: slow_query_fetch_size: 500
slow_query_ignore
A list of keyspaces that the performance service slow query log will ignore. Default: ["OpsCenter" "dse_perf"] Example: slow_query_ignore: ["OpsCenter" "dse_perf"]
config_encryption_active
Specifies whether opscenter should attempt to decrypt sensitive config values. Default: False
config_encryption_key_name
Filename to use for the encryption key. If a custom name is not specified, opsc_system_key is used by default. Example: config_encryption_key_name: opsc_system_key
config_encryption_key_path
Path where the encryption key should be located. If unspecified, the directory of address.yaml is used by default. Example: config_encryption_key_path: /var/lib/datastax-agent/conf/
running-request-cache-size
Size of running requests cache Example: running-request-cache-size: 500
finished-request-cache-size
Size of finished requests cache Example: finished-request-cache-size: 100
tcp_response_timeout
The tcp response timeout used for JMX specified in milliseconds. This value may need to be set very high in order for some operations to complete on nodes with large amounts of data. 0 for no timeout. Default: 240000 Example: tcp_response_timeout: 120000
pong_timeout_ms
The number of milliseconds to wait for a pong reply from opscenterd over stomp before timing out the ping. Example: pong_timeout_ms: 5000
destination_pretest_timeout
The maximum amount of time in seconds to verify a destination can be written to and read from. Default: 60. Example: destination_pretest_timeout: 60