DataStax Agent configuration

Configure DataStax agents with options in the address.yaml file.

The address.yaml configuration file

The address.yaml file contains configuration options for the DataStax Agent.

Table 1. Configuration files

Filename

Location dependent on the type of installation

address.yaml

  • Package installations: /var/lib/datastax-agent/conf/address.yaml

  • Tarball installations: install_location/conf/address.yaml

cassandra-env.sh

  • Package installations: /etc/dse/cassandra/cassandra-env.sh

  • Tarball installations: installation_location/resources/cassandra/conf/cassandra-env.sh

cluster_name.conf

  • Package installations: /etc/opscenter/clusters/cluster_name.conf

  • Tarball installations: install_location/conf/clusters/cluster_name.conf

cassandra.yaml

  • Package installations: /etc/dse/cassandra/cassandra.yaml

  • Tarball installations: installation_location/resources/cassandra/conf/cassandra.yaml

Most of these properties can be set in the [agent_config] section of cluster_name.conf on the opscenterd machine, which automatically propagates the properties to all agents.

Some properties or some cases might require setting these properties directly in address.yaml on applicable agents.

When manually installing agents, stomp_interface is the only property in most environments that needs to be explicitly configured. When automatically installing agents, stomp_interface is configured for you.

For more information about viewing agent status and troubleshooting agent issues, see Agents View.

Configuration options

  • use_ssl

    Whether or not to use SSL communication between the agent and opscenterd. Affects both the STOMP connection and agent HTTP server. Corresponds to [agents].use_ssl in opscenterd.conf. Setting this option to true turns on SSL connections. Example: use_ssl: true

  • stomp_port

    The stomp_port used by opscenterd. Example: stomp_port: 61620

  • stomp_interface

    Reachable IP address of the opscenterd machine. The connection made will be on stomp_port. Example: stomp_interface: 127.0.0.1

  • local_interface

    The IP used to identify the node. If broadcast_address is set in cassandra.yaml, this should be the same as that; otherwise, it is typically the same as listen_address in cassandra.yaml. A good check is to confirm that this address is the same as the address that nodetool ring outputs. Example: local_interface: 172.10.0.2

  • agent_rpc_interface

    The IP that the agent HTTP server listens on. In a multiple region deployment, this is typically a private IP. Default: Matches rpc_interface (native_transport_interface) from cassandra.yaml. Example: agent_rpc_interface: 172.10.0.2

  • agent_rpc_broadcast_address

    The IP that the central OpsCenter process uses to connect to the DataStax agent. Default: First available resolvable address in this order: broadcast_rpc_address (native_transport_broadcast_address), rpc_address (native_transport_address), and listen_address from cassandra.yaml. Example: agent_rpc_broadcast_address: 172.10.0.2

  • longtime_interval

    The length of time, in seconds, between polling attempts to capture information that changes infrequently. Default value: 300

  • realtime_interval

    The length of time, in seconds, between polling attempts to capture rapidly changing realtime information. Default value: 5

  • shorttime_interval

    The length of time, in seconds, between polling attempts to capture information that changes frequently. Default value: 10

  • swagger_enabled

    Enables or disables the swagger UI for the agent API. Example: swagger_enabled: true

  • opscenter_ssl_keystore

    The SSL keystore location that the DataStax Agents use to connect to opscenterd. Example: opscenter_ssl_keystore: /etc/opscenter/conf/.keystore

  • opscenter_ssl_keystore_password

    The SSL keystore password that the agents use to connect to opscenterd. Example: opscenter_ssl_keystore_password: keystore-pass [This field may be encrypted for additional security.]

  • opscenter_ssl_truststore

    The path to the truststore file that the agents use to connect to opscenterd. Example: opscenter_ssl_truststore: /etc/opscenter/conf/.truststore

  • opscenter_ssl_truststore_password

    The SSL truststore password that the agents use to connect to opscenterd. Default: Uses the keystore password if an SSL truststore password is not specified. Example: opscenter_ssl_truststore_password: trust-pass [This field may be encrypted for additional security.]

  • opscenter_ssl_strict_subject_validation

    Instructs the agent to reject certificates from opscenterd when the certificate subject does not match the server’s ip. This option is false by default, which means the agent attempts subject validation first. If that fails, the agent logs a warning and retries the connection without subject validation. In a later version of OpsCenter, the default will change to true. Example: opscenter_ssl_strict_subject_validation: true

  • poll_period

    The length of time, specified in seconds, between attempts to poll metrics. Example: poll_period: 60

  • disk_usage_update_period

    The length of time, in seconds, to wait between attempts to poll the disk for usage. Example: disk_usage_update_period: 60

  • rollup_state_ttl

    The time-to-live (TTL) for data points stored in rollup states before data is collected in the rollup60 table. Default: (* 86400 3) Example: rollup_state_ttl: 259200

  • rollups60_ttl

    Sets time-to-live (TTL) for rollups60 in seconds. A value of '0' prevents the data from expiring. Setting '-1' disables this rollup and prevents storing any data for it which might help reduce used disk space. Default: 604800 Example: rollups60_ttl: 604800

  • rollups300_ttl

    Sets time-to-live (TTL) for rollups300 in seconds. A value of '0' prevents the data from expiring. Setting '-1' disables this rollup and prevents storing any data for it which might help reduce used disk space. Default: 604800 Example: rollups300_ttl: 2419200

  • rollups7200_ttl

    Sets time-to-live (TTL) for rollups7200 in seconds. A value of '0' prevents the data from expiring. Setting '-1' disables this rollup and prevents storing any data for it which might help reduce used disk space. Default: 31536000 Example: rollups7200_ttl: 31536000

  • rollups86400_ttl

    Sets time-to-live (TTL) for rollups86400 in seconds. A value of '0' prevents the data from expiring. Setting '-1' disables this rollup and prevents storing any data for it which might help reduce used disk space. Default: 0 Example: rollups86400_ttl: 0

  • rollup_rate

    Maximum number of metrics that can be saved to Cassandra over the [rollup_rate_unit] period of time.this should be at least ([#tables] * 40) + 200 per minDefault: 200 (so 200/sec with default rollup_rate_unit) Example: rollup_rate: 200

  • rollup_rate_unit

    Unit of time for rollup_rate. Choose from microsecond, millisecond, second, minute, hour, day, or month. Default: second Example: rollup_rate_unit: second

  • bypass_dse_metrics_storage

    Enable or disable storing metrics in a monitored or a separate storage DSE cluster. Metrics are stored in a DSE monitored or storage cluster by default. Default: false. Example: bypass_dse_metrics_storage: true

  • jmx_host

    Host used to connect to local JMX server. The default setting is localhost. This information will be sent by opscenterd for convenience, but can be configured locally as needed. Example: jmx_host: 127.0.0.1

  • jmx_port

    Port used to connect to local JMX server. The default setting is 7199. This information will be sent by opscenterd for convenience, but can be configured locally as needed. Example: jmx_port: 7199

  • jmx_user

    The username used to connect to the local JMX server. Example: jmx_user: jmx-username

  • jmx_pass

    The password used to connect to the local JMX server. Example: jmx_pass: jmx-password [This field may be encrypted for additional security.]

  • jmx_queue_poll_timeout

    The number of seconds to wait for an available JMX connection before timing out. Default: 10. Example: jmx_queue_poll_timeout: 10

  • status_reporting_interval

    The length of time, in seconds, between sending agent health information. Example: status_reporting_interval: 20

  • disk_wait

    The amount of time in milliseconds to wait for disk operations when collecting metrics. Default: 5000. Example: disk_wait: 5000

  • back_track_device_lookup_path

    Whether or not to back track the path if device look up fails. This approach is know to resolve device look up issues for users running Vormetric disk encryption. Default: false.

  • ec2_metadata_api_host

    The ec2 metadata api host, used to determine information about this node, if it is on ec2. Example: ec2_metadata_api_host: 169.254.169.254

  • metrics_enabled

    Whether or not to collect and store metrics for the local node. Setting this option to false turns off metrics collection. Default: true. Example: metrics_enabled: true

  • jmx_metrics_threadpool_size

    The size of the threadpool used for collecting metrics over JMX. Example: jmx_metrics_threadpool_size: 6

  • metrics_ignored_keyspaces

    A comma-separated list of keyspaces ignored by metrics collection. Example: metrics_ignored_keyspaces: ks1, ks2, ks3

  • metrics_ignored_column_families

    A comma-separated list of tables (formerly referred to as column families) ignored by metrics collection. Example: metrics_ignored_column_families: ks1.cf1, ks1.cf2, ks2.cf1

  • metrics_ignored_solr_cores

    A comma separated list of solr cores that will be ignored by metric collection. Example: metrics_ignored_solr_cores: ks1.cf1, ks1.cf2, ks2.cf1

  • hosts

    The DataStax Enterprise node or nodes responsible for storing OpsCenter data. By default, this will be the local node, but may be configured to store data on a separate cluster. The hosts option accepts an array of strings specifying the IP addresses of the node or nodes. For example, ["1.2.3.4"] or ["1.2.3.4", "1.2.3.5"]. Example: hosts: ["127.0.0.1"]

  • cassandra_port

    Port used to connect to the storage cassandra node. The native transport port. Example: cassandra_port: 9042

  • cassandra_user

    The Username used to connect to storage cassandra when authentication is enabled. Example: cassandra_user: cassandra

  • cassandra_pass

    The password used to connect to storage cassandra when authentication is enabled. Example: cassandra_pass: cassandra [This field may be encrypted for additional security.]

  • max_reconnect_time

    The maximum time in ms that the agent will wait between cassandra reconnect attempts. Example: max_reconnect_time: 15000

  • max_pending_repairs

    The maximum number of repairs that may be pending, exceeding this number blocks new repairs. Example: max_pending_repairs: 5

  • ssl_keystore

    The SSL keystore location for the storage cluster that agents use to connect to CQL. Example: ssl_keystore: /etc/dse/conf/.keystore

  • ssl_keystore_password

    The SSL keystore password for the storage cluster that agents use to connect to CQL. Example: ssl_keystore_password: keystore-pass [This field may be encrypted for additional security.]

  • ssl_truststore

    The SSL truststore location for the storage cluster that agents use to connect to CQL. Example: ssl_truststore: /etc/dse/conf/.truststore

  • ssl_truststore_password

    The SSL truststore password for the storage cluster that agents use to connect to CQL. Example: ssl_truststore_password: truststore-pass [This field may be encrypted for additional security.]

  • monitored_cassandra_port

    Port used to connect to the monitored cassandra node. The native transport port. Example: monitored_cassandra_port: 9042

  • monitored_cassandra_user

    The Username used to connect to monitored cassandra when authentication is enabled. Example: monitored_cassandra_user: cassandra

  • monitored_cassandra_pass

    The password used to connect to monitored cassandra when authentication is enabled. Example: monitored_cassandra_pass: cassandra-pass [This field may be encrypted for additional security.]

  • monitored_ssl_keystore

    The SSL keystore location for the monitored cluster that agents use to connect to CQL. Example: monitored_ssl_keystore: /etc/dse/conf/.keystore

  • monitored_ssl_keystore_password

    The SSL keystore password for the monitored cluster that agents use to connect to CQL. Example: monitored_ssl_keystore_password: keystore-pass [This field may be encrypted for additional security.]

  • monitored_ssl_truststore

    The SSL truststore location for the monitored cluster that agents use to connect to CQL. Example: monitored_ssl_truststore: /etc/dse/conf/.truststore

  • monitored_ssl_truststore_password

    The SSL truststore password for the monitored cluster that agents use to connect to CQL. Example: monitored_ssl_truststore_password: truststore-pass [This field may be encrypted for additional security.]

  • kerberos_service

    The Kerberos service name to use when using Kerberos authentication within DSE. Example: kerberos_service: cassandra-kerberos

  • kerberos_keytab_location

    The Kerberos keytab location when using Kerberos authentication within DSE. Example: kerberos_keytab_location: /path/to/keytab.keytab

  • kerberos_client_principal

    The Kerberos client principal to use when using Kerberos authentication within DSE. Example: kerberos_client_principal: cassandra@hostname

  • storage_keyspace

    The keyspace that the agent will use to store data. Example: storage_keyspace: OpsCenter

  • alias

    Provides an alias for the agent to use when sending node details to OpsCenter. The alias is useful when the agent is unable to get the localhost name from InetAddress.getLocalHost(). Example: alias: MyNodeOne

  • storage_dse_connection_timeout

    The maximum time in seconds that the agent waits while attempting to connect to the DSE cluster. Default: 30. Example: storage_dse_connection_timeout: 30

  • storage_dse_host_read_timeout

    The maximum time in milliseconds that the agent waits for a storage node to return a response from a read request before considering said node unresponsive. Should be set higher than read_request_timeout_in_ms in cassandra.yaml. Example: storage_dse_host_read_timeout: 10000

  • monitored_dse_connection_timeout

    The maximum time in seconds that the agent waits while attempting to connect to the DSE cluster. Default: 30. Example: monitored_dse_connection_timeout: 30

  • monitored_dse_host_read_timeout

    The maximum time in milliseconds that the agent waits for a monitored node to return a response from a read request before considering said node unresponsive. Should be set higher than read_request_timeout_in_ms in cassandra.yaml. Example: monitored_dse_host_read_timeout: 10000

  • cassandra_install_location

    The base directory where DataStax Enterprise or Cassandra is installed. When not set, the agent attempts to auto-detect the location but cannot do so in all cases. Example: cassandra_install_location: /usr/share/dse

  • cassandra_log_location

    The directory in which DSE logs reside. This is only used for the diagnostics tarball, and should only be set if these logs are in a location other than the default. Example: cassandra_log_location: /var/log/cassandra

  • cassandra_binary_location

    The location of Cassandra’s binaries’ directory (cqlsh, nodetool, and sstableloader). When not set, the agent attempts to auto-detect the location. Example: cassandra_binary_location: /usr/bin

  • cassandra_conf_location

    The location of Cassandra’s configuration files’ directory (cassandra.yaml, cassandra-env.sh). When not set, the agent attempts to auto-detect the location. Example: cassandra_conf_location: /etc/dse/cassandra

  • dse_env_location

    The location of directory that holds dse-env.sh. When not set, the agent attempts to auto-detect the location. Example: dse_env_location: /etc/dse

  • dse_binary_location

    The location of directory that holds dsetool. When not set, the agent attempts to auto-detect the location. Example: dse_binary_location: /usr/bin

  • dse_conf_location

    The location of directory that holds dse.yaml. When not set, the agent attempts to auto-detect the location. Example: dse_conf_location: /etc/dse

  • spark_conf_location

    The location of directory that holds spark-env.sh. When not set, the agent attempts to auto-detect the location. Example: spark_conf_location: /etc/dse/spark

  • spark_log_location

    The location of directory that holds spark logs. When not set, the agent attempts to auto-detect the location. Example: spark_log_location: /var/log/spark

  • solr_log_location

    The location of directory that holds solr logs. When not set, the agent attempts to auto-detect the location. Example: solr_log_location: /var/log/cassandra

  • agent_log_location

    The path to the OpsCenter agent.log file and additional log files for the DataStax Agent. Example: agent_log_location: nodes/logs/opsagent

  • cassandra_rpc_interface

    When unspecified, the agent will attempt to determine cassandra rpc_address by reading cassandra.yaml for rpc_address (native_transport_address). When specified, this agent lookup is skipped and the specified value is used instead. Example: cassandra_rpc_interface: 172.10.0.2

  • api_port

    The port used for the http api endpoint. Example: api_port: 61621

  • runs_sudo

    Sets whether the DataStax Agent will be run using sudo or not. Setting this option to false means the agent will not use sudo, and the agent user will not run using elevated privileges. Setting this option to true means the agent will run using sudo, and elevated privileges. Default is true. Example: runs_sudo: true

  • s3_proxy_host

    The optional proxy host the client will connect through. Example: s3_proxy_host: localhost

  • s3_proxy_port

    The optional proxy port the client will connect through. Example: s3_proxy_port: 80

  • destination-transfer-pool-size

    The size of the thread pool to allocate for destination processes Example: destination-transfer-pool-size: 10

  • destination-transfer-pool-keepalive

    The number of minutes threads in the destination processes thread pool can be idle before being shutdown. Threads that are shutdown will be created again as the demand on the thread pool increases. Example: destination-transfer-pool-keepalive: 2

  • restore_req_update_period

    The frequency in seconds with which status updates are sent to opscenterd during Restore operations in the Backup Service. Default: 60. Example: restore_req_update_period: 60

  • restore_parallel_factor

    Determines how many concurrent threads to use when transferring file from a destination to a local node during the restore. In order to maintain semantics with prior versions of OpsCenter, the default value is 1. Example: restore_parallel_factor: 3

  • backup_staging_dir

    The directory where commitlogs are copied after they are written to disk from DSE. The DataStax Agents monitor this directory and move commitlogs to the configured destinations. After all destinations receive the relevant commit logs, the logs are moved to the backup_storage_dir. The default location is /var/lib/datastax-agent/commitlogs/. Example: backup_staging_dir: /var/lib/datastax-agent/commitlogs/

  • backup_storage_dir

    The directory where On Server commitlog backups are stored after being copied to all configured destinations. The directory will be cleaned based on a configured retention policy for an On Server location. The directory should be large enough to hold commitlogs for the length of the retention policy. The default location is /var/lib/datastax-agent/backups/. Example: backup_storage_dir: /var/lib/datastax-agent/backups/

  • tmp_dir

    The directory used to temporarily stage files when restoring. The default location is /var/lib/datastax-agent/tmp. Example: tmp_dir: /var/lib/datastax-agent/tmp/

  • remote_backup_retries

    The number of attempts to make when file download fails during a restore. Default: 3. Example: remote_backup_retries: 3

  • remote_backup_timeout

    The timeout in milliseconds for the connection used to push backups to remote destinations. Default: 1000. Example: remote_backup_timeout: 1000

  • use_s3_cli

    Enable using the AWS CLI instead of the AWS SDK when bulk loading backups to Amazon S3 locations. Default: false. Example: use_s3_cli: true

  • azure_parallel_level

    Value to pass as parallel-level into azcopy for azure backup destinations. If set to 0 the parameter wont be used. Default: 0 Example: azure_parallel_level: 36

  • use_swift_cli

    Labs feature. When enabled OpenStack Swift destinations are enabled. Default: false. Example: use_swift_cli: true

  • swift_cli_sync_status_delay_seconds

    Labs feature. Controls how long to wait before checking entity sync status to allow for eventual consistency. Default: 10. Example: swift_cli_sync_status_delay_seconds: 10

  • swift_cli_skip_diff_after_upload

    Labs feature. Controls whether to skip the file difference check after uploading files to a Swift destination. Default: false. Example: swift_cli_skip_diff_after_upload: true

  • skip_verification_after_upload

    Controls whether to skip the verification check using file listing post upload for backup destinations. Default: false. Example: skip_verification_after_upload: true

  • remote_verify_initial_delay

    Initial delay in milliseconds to wait before checking if a file was successfully uploaded during a backup operation. This configuration option works in conjunction with the remote_verify_max option to distinguish between broken versus tardy backups when cleaning up SSTables. The remote_verify_initial_delay value doubles each time a file transfer validation failure occurs until the value exceeds the remote_verify_max value. Default: 1000 (1 second). Example: remote_verify_initial_delay: 1000

  • remote_verify_max

    The maximum time period to wait after a file upload completed but is still unreadable from the remote destination. When this delay is exceeded, the transfer is considered failed. This configuration option works in conjunction with the remote_verify_initial_delay option to distinguish between broken versus tardy backups when cleaning up SSTables. Default: 30000 (30 seconds). Example: remote_verify_max: 300000

  • restore_on_transfer_failure

    When set to true, a failed file transfer from the remote destination will not halt the restore process. process. A future restore attempt uses any successfully transferred files. Default: false. Example: restore_on_transfer_failure: false

  • remote_backup_region

    The AWS region to use for remote backup transfers. Default: us-west-1. Example: remote_backup_region: us-west-1

  • max_file_transfer_attempts

    The maximum number of attempts to upload a file or create a remote destination. Default: 3. Example: max_file_transfer_attempts: 30

  • sstableloader_max_heap_size

    The maximum heap size used by the sstableloader during restore operations. Default: 256M. Example: sstableloader_max_heap_size: 256M

  • trace_delay

    The time in milliseconds to wait between issuing a query to trace and fetching trace events in the Performance Service Slow Query panel. Default: 300. Example: trace_delay: 300

  • support_shell_timeout

    The number of seconds to wait for a shell process such as nodetool to run before timing out. This setting is only used for generating a diagnostic tarball. Default: 30. Example: support_shell_timeout: 30

  • graphite_host

    Setting graphite_host enables the forwarding of metrics to a graphite server at the given address. Leaving the graphite_host blank disables forwarding metrics to the graphite server. Example: graphite_host: graphite.myhost.com

  • graphite_port

    Port for graphite’s plaintext protocol. Example: graphite_port: 2003

  • graphite_prefix

    A prefix to insert metrics under. Example: graphite_prefix: opscenter

  • slow_query_past

    How far into the past in milliseconds to look for slow queries. Default: 3600000 (1,000 hours). Example: slow_query_past: 3600000

  • slow_query_refresh

    Time in seconds between slow query refreshes. Default: 5. Example: slow_query_refresh: 5

  • slow_query_fetch_size

    The limit to how many slow queries are fetched. Default: 500. Example: slow_query_fetch_size: 500

  • slow_query_ignore

    A list of keyspaces that the performance service slow query log will ignore. Default: ["OpsCenter" "dse_perf"] Example: slow_query_ignore: ["OpsCenter" "dse_perf"]

  • config_encryption_active

    Specifies whether opscenter should attempt to decrypt sensitive config values. Default: False

  • config_encryption_key_name

    Filename to use for the encryption key. If a custom name is not specified, opsc_system_key is used by default. Example: config_encryption_key_name: opsc_system_key

  • config_encryption_key_path

    Path where the encryption key should be located. If unspecified, the directory of address.yaml is used by default. Example: config_encryption_key_path: /var/lib/datastax-agent/conf/

  • running-request-cache-size

    Size of running requests cache Example: running-request-cache-size: 500

  • finished-request-cache-size

    Size of finished requests cache Example: finished-request-cache-size: 100

  • tcp_response_timeout

    The tcp response timeout used for JMX specified in milliseconds. This value may need to be set very high in order for some operations to complete on nodes with large amounts of data. 0 for no timeout. Default: 240000 Example: tcp_response_timeout: 120000

  • pong_timeout_ms

    The number of milliseconds to wait for a pong reply from opscenterd over stomp before timing out the ping. Example: pong_timeout_ms: 5000

  • destination_pretest_timeout

    The maximum amount of time in seconds to verify a destination can be written to and read from. Default: 60. Example: destination_pretest_timeout: 60