DataStax Agent configuration
Configure DataStax agents with options in the address.yaml file.
The address.yaml configuration file
cluster_name.conf
The location of the cluster_name.conf file depends on the type of installation:- Package installations: /etc/opscenter/clusters/cluster_name.conf
- Tarball installations: install_location/conf/clusters/cluster_name.conf
cassandra-env.sh
The location of the cassandra-env.sh file depends on the type of installation:Package installations | /etc/dse/cassandra/cassandra-env.sh |
Tarball installations | installation_location/resources/cassandra/conf/cassandra-env.sh |
cassandra.yaml
The location of the cassandra.yaml file depends on the type of installation:Package installations | /etc/dse/cassandra/cassandra.yaml |
Tarball installations | installation_location/resources/cassandra/conf/cassandra.yaml |
address.yaml
The location of the address.yaml file depends on the type of installation:- Package installations: /var/lib/datastax-agent/conf/address.yaml
- Tarball installations: install_location/conf/address.yaml
The address.yaml file contains configuration options for the DataStax Agent.
Most of these properties can be set in the [agent_config]
section of
cluster_name.conf on the opscenterd machine, which
automatically propagates the properties to all agents. Some properties or some cases might
require setting these properties directly in address.yaml on applicable
agents. When manually installing agents,
stomp_interface
is the only property in most environments that needs to
be explicitly configured. When automatically
installing agents, stomp_interface
is configured for you.
For more information about viewing agent status and troubleshooting agent issues, see Agents View.
Configuration options
- use_ssl
- Whether or not to use SSL communication between the agent and opscenterd. Affects both the STOMP connection and agent HTTP server. Corresponds to [agents].use_ssl in opscenterd.conf. Setting this option to true turns on SSL connections. Example:
use_ssl: true
- stomp_port
- The stomp_port used by opscenterd. Example:
stomp_port: 61620
- stomp_interface
- Reachable IP address of the opscenterd machine. The connection made will be on stomp_port. Example:
stomp_interface: 127.0.0.1
- local_interface
- The IP used to identify the node. If broadcast_address is set in cassandra.yaml, this should be the same as that; otherwise, it is typically the same as listen_address in cassandra.yaml. A good check is to confirm that this address is the same as the address that nodetool ring outputs. Example:
local_interface: 172.10.0.2
- agent_rpc_interface
- The IP that the agent HTTP server listens on. In a multiple region deployment, this is typically a private IP. Default: Matches rpc_interface (native_transport_interface) from cassandra.yaml. Example:
agent_rpc_interface: 172.10.0.2
- agent_rpc_broadcast_address
- The IP that the central OpsCenter process uses to connect to the DataStax agent. Default: First available resolvable address in this order: broadcast_rpc_address (native_transport_broadcast_address), rpc_address (native_transport_address), and listen_address from cassandra.yaml. Example:
agent_rpc_broadcast_address: 172.10.0.2
- swagger_enabled
- Enables or disables the swagger UI for the agent API. Example:
swagger_enabled: true
- opscenter_ssl_keystore
- On target nodes where DataStax Agents are running, the path to the SSL keystore file that the Agents use to connect to opscenterd. Example:
opscenter_ssl_keystore: /usr/share/opscenter/ssl/agentKeystore
- opscenter_ssl_keystore_password
- The SSL keystore password that the agents use to connect to opscenterd. Example:
opscenter_ssl_keystore_password: keystore-pass
[This field may be encrypted for additional security.] - opscenter_ssl_truststore
- The path to the truststore file that the agents use to connect to opscenterd. Example:
opscenter_ssl_truststore: /usr/share/opscenter/ssl/trustStore
- opscenter_ssl_truststore_password
- The SSL truststore password that the agents use to connect to opscenterd. Default: Uses the keystore password if an SSL truststore password is not specified. Example:
opscenter_ssl_truststore_password: trust-pass
[This field may be encrypted for additional security.] - opscenter_ssl_strict_subject_validation
- Instructs the agent to reject certificates from opscenterd when the certificate subject does not match the server's ip. This option is false by default, which means the agent attempts subject validation first. If that fails, the agent logs a warning and retries the connection without subject validation. In a later version of OpsCenter, the default will change to true. Example:
opscenter_ssl_strict_subject_validation: true
- poll_period
- The length of time, specified in seconds, between attempts to poll metrics. Example:
poll_period: 60
- disk_usage_update_period
- The length of time, in seconds, to wait between attempts to poll the disk for usage. Example:
disk_usage_update_period: 60
- rollup_state_ttl
- The time-to-live (TTL) for data points stored in rollup states before data is collected in the rollup60 table. Default: (* 86400 3) Example:
rollup_state_ttl: 259200
- rollups60_ttl
- Sets time-to-live (TTL) for rollups60 in seconds. A value of '0' prevents the data from expiring. Setting '-1' disables this rollup and prevents storing any data for it which might help reduce used disk space. Default: 604800 Example:
rollups60_ttl: 604800
- rollups300_ttl
- Sets time-to-live (TTL) for rollups300 in seconds. A value of '0' prevents the data from expiring. Setting '-1' disables this rollup and prevents storing any data for it which might help reduce used disk space. Default: 604800 Example:
rollups300_ttl: 2419200
- rollups7200_ttl
- Sets time-to-live (TTL) for rollups7200 in seconds. A value of '0' prevents the data from expiring. Setting '-1' disables this rollup and prevents storing any data for it which might help reduce used disk space. Default: 31536000 Example:
rollups7200_ttl: 31536000
- rollups86400_ttl
- Sets time-to-live (TTL) for rollups86400 in seconds. A value of '0' prevents the data from expiring. Setting '-1' disables this rollup and prevents storing any data for it which might help reduce used disk space. Default: 0 Example:
rollups86400_ttl: 0
- rollup_rate
- Maximum number of metrics that can be saved to Cassandra over the [rollup_rate_unit] period of time.this should be at least ([#tables] * 40) + 200 per minDefault: 200 (so 200/sec with default rollup_rate_unit) Example:
rollup_rate: 200
- rollup_rate_unit
- Unit of time for rollup_rate. Choose from microsecond, millisecond, second, minute, hour, day, or month. Default: second Example:
rollup_rate_unit: second
- bypass_dse_metrics_storage
- Enable or disable storing metrics in a separate storage DataStax Enterprise (DSE) cluster. Metrics are stored in an OpsCenter keyspace on the same DSE cluster being monitored by default. Note: storing metrics on a separate DSE cluster will place the entire OpsCenter keyspace on that cluster. Default: false. Example:
bypass_dse_metrics_storage: false
- jmx_host
- Host used to connect to local JMX server. The default setting is localhost. This information will be sent by opscenterd for convenience, but can be configured locally as needed. Example:
jmx_host: 127.0.0.1
- jmx_port
- Port used to connect to local JMX server. The default setting is 7199. This information will be sent by opscenterd for convenience, but can be configured locally as needed. Example:
jmx_port: 7199
- jmx_user
- The username used to connect to the local JMX server. Example:
jmx_user: jmx-username
- jmx_pass
- The password used to connect to the local JMX server. Example:
jmx_pass: jmx-password
[This field may be encrypted for additional security.] - jmx_queue_poll_timeout
- The number of seconds to wait for an available JMX connection before timing out. Default: 10. Example:
jmx_queue_poll_timeout: 10
- status_reporting_interval
- The length of time, in seconds, between sending agent health information. Example:
status_reporting_interval: 20
- disk_wait
- The amount of time in milliseconds to wait for disk operations when collecting metrics. Default: 5000. Example:
disk_wait: 5000
- ec2_metadata_api_host
- The ec2 metadata api host, used to determine information about this node, if it is on ec2. Example:
ec2_metadata_api_host: 169.254.169.254
- metrics_enabled
- Whether or not to collect and store metrics for the local node. Setting this option to false turns off metrics collection. Default: true. Example:
metrics_enabled: true
- jmx_metrics_threadpool_size
- The size of the threadpool used for collecting metrics over JMX. Example:
jmx_metrics_threadpool_size: 6
- metrics_ignored_keyspaces
- A comma-separated list of keyspaces ignored by metrics collection. Example:
metrics_ignored_keyspaces: ks1, ks2, ks3
- metrics_ignored_column_families
- A comma-separated list of tables (formerly referred to as column families) ignored by metrics collection. Example:
metrics_ignored_column_families: ks1.cf1, ks1.cf2, ks2.cf1
- metrics_ignored_solr_cores
- A comma separated list of solr cores that will be ignored by metric collection. Example:
metrics_ignored_solr_cores: ks1.cf1, ks1.cf2, ks2.cf1
- hosts
- The DataStax Enterprise node or nodes responsible for storing OpsCenter data. By default, this will be the local node, but may be configured to store data on a separate cluster. The hosts option accepts an array of strings specifying the IP addresses of the node or nodes. For example,
["1.2.3.4"]
or["1.2.3.4", "1.2.3.5"]
. Example:hosts: ["127.0.0.1"]
- cassandra_port
- Port used to connect to the storage cassandra node. The native transport port. Example:
cassandra_port: 9042
- cassandra_user
- The Username used to connect to storage cassandra when authentication is enabled. Example:
cassandra_user: cassandra
- cassandra_pass
- The password used to connect to storage cassandra when authentication is enabled. Example:
cassandra_pass: cassandra
[This field may be encrypted for additional security.] - max_reconnect_time
- The maximum time in ms that the agent will wait between cassandra reconnect attempts. Example:
max_reconnect_time: 15000
- max_pending_repairs
- The maximum number of repairs that may be pending, exceeding this number blocks new repairs. Example:
max_pending_repairs: 5
- ssl_keystore
- The SSL keystore location for the storage cluster that agents use to connect to CQL. Example:
ssl_keystore: /etc/dse/conf/.keystore
- ssl_keystore_password
- The SSL keystore password for the storage cluster that agents use to connect to CQL. Example:
ssl_keystore_password: keystore-pass
[This field may be encrypted for additional security.] - ssl_truststore
- The SSL truststore location for the storage cluster that agents use to connect to CQL. Example:
ssl_truststore: /etc/dse/conf/.truststore
- ssl_truststore_password
- The SSL truststore password for the storage cluster that agents use to connect to CQL. Example:
ssl_truststore_password: truststore-pass
[This field may be encrypted for additional security.] - monitored_cassandra_port
- Port used to connect to the monitored cassandra node. The native transport port. Example:
monitored_cassandra_port: 9042
- monitored_cassandra_user
- The Username used to connect to monitored cassandra when authentication is enabled. Example:
monitored_cassandra_user: cassandra
- monitored_cassandra_pass
- The password used to connect to monitored cassandra when authentication is enabled. Example:
monitored_cassandra_pass: cassandra-pass
[This field may be encrypted for additional security.] - monitored_ssl_keystore
- The SSL keystore location for the monitored cluster that agents use to connect to CQL. Example:
monitored_ssl_keystore: /etc/dse/conf/.keystore
- monitored_ssl_keystore_password
- The SSL keystore password for the monitored cluster that agents use to connect to CQL. Example:
monitored_ssl_keystore_password: keystore-pass
[This field may be encrypted for additional security.] - monitored_ssl_truststore
- The SSL truststore location for the monitored cluster that agents use to connect to CQL. Example:
monitored_ssl_truststore: /etc/dse/conf/.truststore
- monitored_ssl_truststore_password
- The SSL truststore password for the monitored cluster that agents use to connect to CQL. Example:
monitored_ssl_truststore_password: truststore-pass
[This field may be encrypted for additional security.] - kerberos_service
- The Kerberos service name to use when using Kerberos authentication within DSE. Example:
kerberos_service: cassandra-kerberos
- kerberos_keytab_location
- The Kerberos keytab location when using Kerberos authentication within DSE. Example:
kerberos_keytab_location: /path/to/keytab.keytab
- kerberos_client_principal
- The Kerberos client principal to use when using Kerberos authentication within DSE. Example:
kerberos_client_principal: cassandra@hostname
- storage_keyspace
- The keyspace that the agent will use to store data. Example:
storage_keyspace: OpsCenter
- alias
- Provides an alias for the agent to use when sending node details to OpsCenter. The alias is useful when the agent is unable to get the localhost name from InetAddress.getLocalHost(). Example:
alias: MyNodeOne
- storage_dse_connection_timeout
- The maximum time in seconds that the agent waits while attempting to connect to the DSE cluster. Default: 30. Example:
storage_dse_connection_timeout: 30
- storage_dse_host_read_timeout
- The maximum time in milliseconds that the agent waits for a storage node to return a response from a read request before considering said node unresponsive. Should be set higher than read_request_timeout_in_ms in cassandra.yaml. Example:
storage_dse_host_read_timeout: 10000
- monitored_dse_connection_timeout
- The maximum time in seconds that the agent waits while attempting to connect to the DSE cluster. Default: 30. Example:
monitored_dse_connection_timeout: 30
- monitored_dse_host_read_timeout
- The maximum time in milliseconds that the agent waits for a monitored node to return a response from a read request before considering said node unresponsive. Should be set higher than read_request_timeout_in_ms in cassandra.yaml. Example:
monitored_dse_host_read_timeout: 10000
- cassandra_install_location
- The base directory where DataStax Enterprise or Cassandra is installed. When not set, the agent attempts to auto-detect the location but cannot do so in all cases. Example:
cassandra_install_location: /usr/share/dse
- cassandra_log_location
- The directory in which DSE logs reside. This is only used for the diagnostics tarball, and should only be set if these logs are in a location other than the default. Example:
cassandra_log_location: /var/log/cassandra
- cassandra_binary_location
- The location of Cassandra's binaries’ directory (cqlsh, nodetool, and sstableloader). When not set, the agent attempts to auto-detect the location. Example:
cassandra_binary_location: /usr/bin
- cassandra_conf_location
- The location of Cassandra's configuration files’ directory (cassandra.yaml, cassandra-env.sh). When not set, the agent attempts to auto-detect the location. Example:
cassandra_conf_location: /etc/dse/cassandra
- dse_env_location
- The location of directory that holds dse-env.sh. When not set, the agent attempts to auto-detect the location. Example:
dse_env_location: /etc/dse
- dse_binary_location
- The location of directory that holds dsetool. When not set, the agent attempts to auto-detect the location. Example:
dse_binary_location: /usr/bin
- dse_conf_location
- The location of directory that holds dse.yaml. When not set, the agent attempts to auto-detect the location. Example:
dse_conf_location: /etc/dse
- spark_conf_location
- The location of directory that holds spark-env.sh. When not set, the agent attempts to auto-detect the location. Example:
spark_conf_location: /etc/dse/spark
- spark_log_location
- The location of directory that holds spark logs. When not set, the agent attempts to auto-detect the location. Example:
spark_log_location: /var/log/spark
- solr_log_location
- The location of directory that holds solr logs. When not set, the agent attempts to auto-detect the location. Example:
solr_log_location: /var/log/cassandra
- agent_log_location
- The path to the OpsCenter agent.log file and additional log files for the DataStax Agent. Example:
agent_log_location: nodes/logs/opsagent
- cassandra_rpc_interface
- When unspecified, the agent will attempt to determine cassandra rpc_address by reading cassandra.yaml for rpc_address (native_transport_address). When specified, this agent lookup is skipped and the specified value is used instead. Example:
cassandra_rpc_interface: 172.10.0.2
- api_port
- The port used for the http api endpoint. Example:
api_port: 61621
- runs_sudo
- Sets whether the DataStax Agent will be run using sudo or not. Setting this option to false means the agent will not use sudo, and the agent user will not run using elevated privileges. Setting this option to true means the agent will run using sudo, and elevated privileges. Default is true. Example:
runs_sudo: true
- s3_proxy_host
- The optional proxy host the client will connect through. Example:
s3_proxy_host: localhost
- s3_proxy_port
- The optional proxy port the client will connect through. Example:
s3_proxy_port: 80
- destination-transfer-pool-size
- The size of the thread pool to allocate for destination processes Example:
destination-transfer-pool-size: 10
- destination-transfer-pool-keepalive
- The number of minutes threads in the destination processes thread pool can be idle before being shutdown. Threads that are shutdown will be created again as the demand on the thread pool increases. Example:
destination-transfer-pool-keepalive: 2
- restore_req_update_period
- The frequency in seconds with which status updates are sent to opscenterd during Restore operations in the Backup Service. Default: 60. Example:
restore_req_update_period: 60
- restore_parallel_factor
- Determines how many concurrent threads to use when transferring file from a destination to a local node during the restore. In order to maintain semantics with prior versions of OpsCenter, the default value is 1. Example:
restore_parallel_factor: 3
- backup_staging_dir
- The directory where commitlogs are copied after they are written to disk from DSE. The DataStax Agents monitor this directory and move commitlogs to the configured destinations. After all destinations receive the relevant commit logs, the logs are moved to the backup_storage_dir. The default location is /var/lib/datastax-agent/commitlogs/. Example:
backup_staging_dir: /var/lib/datastax-agent/commitlogs/
- backup_storage_dir
- The directory where On Server commitlog backups are stored after being copied to all configured destinations. The directory will be cleaned based on a configured retention policy for an On Server location. The directory should be large enough to hold commitlogs for the length of the retention policy. The default location is /var/lib/datastax-agent/backups/. Example:
backup_storage_dir: /var/lib/datastax-agent/backups/
- tmp_dir
- The directory used to temporarily stage files when restoring. The default location is /var/lib/datastax-agent/tmp. Example:
tmp_dir: /var/lib/datastax-agent/tmp/
- remote_backup_retries
- The number of attempts to make when file download fails during a restore. Default: 3. Example:
remote_backup_retries: 3
- remote_backup_timeout
- The timeout in milliseconds for the connection used to push backups to remote destinations. Default: 1000. Example:
remote_backup_timeout: 1000
- use_s3_cli
- Enable using the AWS CLI instead of the AWS SDK when bulk loading backups to Amazon S3 locations. Default: false. Example:
use_s3_cli: true
- use_swift_cli
- Labs feature. When enabled OpenStack Swift destinations are enabled. Default: false. Example:
use_swift_cli: true
- swift_cli_sync_status_delay_seconds
- Labs feature. Controls how long to wait before checking entity sync status to allow for eventual consistency. Default: 10. Example:
swift_cli_sync_status_delay_seconds: 10
- swift_cli_skip_diff_after_upload
- Labs feature. Controls whether to skip the diff check using file listing post upload for swift destinations. Default: false. Example:
swift_cli_skip_diff_after_upload: true
- remote_verify_initial_delay
- Initial delay in milliseconds to wait before checking if a file was successfully uploaded during a backup operation. This configuration option works in conjunction with the
remote_verify_max
option to distinguish between broken versus tardy backups when cleaning up SSTables. Theremote_verify_initial_delay
value doubles each time a file transfer validation failure occurs until the value exceeds theremote_verify_max
value. Default: 1000 (1 second). Example:remote_verify_initial_delay: 1000
- remote_verify_max
- The maximum time period to wait after a file upload completed but is still unreadable from the remote destination. When this delay is exceeded, the transfer is considered failed. This configuration option works in conjunction with the
remote_verify_initial_delay
option to distinguish between broken versus tardy backups when cleaning up SSTables. Default: 30000 (30 seconds). Example:remote_verify_max: 300000
- restore_on_transfer_failure
- When set to true, a failed file transfer from the remote destination will not halt the restore process. process. A future restore attempt uses any successfully transferred files. Default: false. Example:
restore_on_transfer_failure: false
- remote_backup_region
- The AWS region to use for remote backup transfers. Default: us-west-1. Example:
remote_backup_region: us-west-1
- max_file_transfer_attempts
- The maximum number of attempts to upload a file or create a remote destination. Default: 3. Example:
max_file_transfer_attempts: 30
- sstableloader_max_heap_size
- The maximum heap size used by the sstableloader during restore operations. Default: 256M. Example:
sstableloader_max_heap_size: 256M
- trace_delay
- The time in milliseconds to wait between issuing a query to trace and fetching trace events in the Performance Service Slow Query panel. Default: 300. Example:
trace_delay: 300
- support_shell_timeout
- The number of seconds to wait for a shell process such as nodetool to run before timing out. This setting is only used for generating a diagnostic tarball. Default: 30. Example:
support_shell_timeout: 30
- graphite_host
- Setting graphite_host enables the forwarding of metrics to a graphite server at the given address. Leaving the graphite_host blank disables forwarding metrics to the graphite server. Example:
graphite_host: graphite.myhost.com
- graphite_port
- Port for graphite's plaintext protocol. Example:
graphite_port: 2003
- graphite_prefix
- A prefix to insert metrics under. Example:
graphite_prefix: opscenter
- slow_query_past
- How far into the past in milliseconds to look for slow queries. Default: 3600000 (1,000 hours). Example:
slow_query_past: 3600000
- slow_query_refresh
- Time in seconds between slow query refreshes. Default: 5. Example:
slow_query_refresh: 5
- slow_query_fetch_size
- The limit to how many slow queries are fetched. Default: 500. Example:
slow_query_fetch_size: 500
- slow_query_ignore
- A list of keyspaces that the performance service slow query log will ignore. Default: ["OpsCenter" "dse_perf"] Example:
slow_query_ignore: ["OpsCenter" "dse_perf"]
- config_encryption_active
- Specifies whether opscenter should attempt to decrypt sensitive config values. Default: False
- config_encryption_key_name
- Filename to use for the encryption key. If a custom name is not specified, opsc_system_key is used by default. Example:
config_encryption_key_name: opsc_system_key
- config_encryption_key_path
- Path where the encryption key should be located. If unspecified, the directory of address.yaml is used by default. Example:
config_encryption_key_path: /var/lib/datastax-agent/conf/
- running-request-cache-size
- Size of running requests cache Example:
running-request-cache-size: 500
- finished-request-cache-size
- Size of finished requests cache Example:
finished-request-cache-size: 100
- tcp_response_timeout
- The tcp response timeout used for JMX specified in milliseconds. This value may need to be set very high in order for some operations to complete on nodes with large amounts of data. 0 for no timeout. Default: 240000 Example:
tcp_response_timeout: 120000
- pong_timeout_ms
- The number of milliseconds to wait for a pong reply from opscenterd over stomp before timing out the ping. Example:
pong_timeout_ms: 5000
- destination_pretest_timeout
- The maximum amount of time in seconds to verify a destination can be written to and read from. Default: 60. Example:
destination_pretest_timeout: 60