DataStax Agent configuration

Configure DataStax agents with options in the address.yaml file. Fix and troubleshoot agent connections.

Agent auto-connect 

If you are adding an existing cluster to OpsCenter, and the nodes do not have the agent installed, OpsCenter displays a Fix Agents link next to the number of nodes. Clicking Fix Agents causes OpsCenter to attempt to install and start the agent on any nodes that do not have a running agent, as described in Automatic installation of DataStax agents and Installing DataStax agents.

If clicking Fix Agents did not work, there are a number of things that could prevent OpsCenter from successfully installing and starting the agent on the nodes. For example, if the agent was previously installed on the node and has an incorrect configuration, OpsCenter cannot connect to the agent. Some simple things to check if the agent fails to connect:
  • Verify the agent is able to start up without errors on the node. Look through the agent logs for errors.
  • Check the settings of the stomp_interface and stomp_port options for each agent to make sure they match and are pointing to the OpsCenter host and port, respectively.
  • Test that the OpsCenter host and port are accessible from the node. A firewall might be blocking access to the OpsCenter port, for example.
  • Verify that agents have the correct configuration settings for SSL communication if OpsCenter is configured to use SSL communication between itself and the agents.
  • Test that the SSH credentials you have entered in OpsCenter work on each node.
  • Verify JMX connectivity is enabled on the node.

In most environments, stomp_interface is the only property that will need to be explicitly configured, and this might happen automatically as previously mentioned. You can set most of these properties in the [agent_config] section of cluster_name.conf on the opscenterd machine and the properties propagate automatically to all agents. Some properties or some cases will require setting these properties directly in address.yaml on applicable agents.

The address.yaml configuration file 

The address.yaml file contains configuration options for the DataStax Agent.

Note: As of version 5.1 of OpsCenter, the hosts option in address.yaml now determines which nodes the agent connects to. For further information on configuration changes and migration paths, see the Upgrade Guide.

address.yaml 

The location of the address.yaml file depends on the type of installation:

  • Installer-Services or package installations: /var/lib/datastax-agent/conf/address.yaml
  • Installer-No Services or tarball installations: install_location/conf/address.yaml

opscenterd.conf 

The location of the opscenterd.conf file depends on the type of installation:

  • Installer-Services or package installations: /etc/opscenter/opscenterd.conf
  • Installer-No Services or tarball installations: install_location/conf/opscenterd.conf
  • Windows installations: Program Files (x86)\DataStax Community\opscenter\conf\opscenterd.conf

cluster_name.conf 

The location of the cluster_name.conf file depends on the type of installation:

  • Installer-Services or package installations: /etc/opscenter/clusters/cluster_name.conf
  • Installer-No Services or tarball installations: install_location/conf/clusters/cluster_name.conf
  • Windows installations: Program Files (x86)\DataStax Community\opscenter\conf\clusters\cluster_name.conf

Configuration options 

stomp_interface
(Required) Reachable IP address of the opscenterd machine. The connection is made on stomp_port.
stomp_port
The stomp_port used by opscenterd. Default: 61620.
use_ssl
Whether to use SSL communication between the agent and opscenterd. Affects both the STOMP connection and agent HTTP server. Corresponds to [agents] use_ssl in opscenterd.conf. Setting this option to 1 turns on SSL connections. Default: 0.
local_interface
The IP used to identify the node in opscenterd. If broadcast_address is set in cassandra.yaml, this should be the same as that; otherwise, it is typically the same as listen_address in cassandra.yaml. A good check is to confirm that this address is the same as the address that nodetool ring outputs. If not set, the agent attempts to determine the proper IP address via JMX.
agent_rpc_interface
The IP that the agent HTTP server listens on. In a multiple region deployment, this is typically a private IP. Default: Matches rpc_interface from cassandra.yaml.
agent_rpc_broadcast_address
The IP that the central OpsCenter process uses to connect to the DataStax agent. Default: Matches agent_rpc_interface.
api_port
The port used for the agent's HTTP endpoint. Default: 61621.
hosts
The DataStax Enterprise node or nodes responsible for storing OpsCenter data. By default, this will be the local node, but may be configured to store data on a separate cluster. The hosts option accepts an array of strings specifying the IP addresses of the node or nodes. For example, ["1.2.3.4"] or ["1.2.3.4", "1.2.3.5"].
cassandra_port
Port used to connect to the storage cassandra node. The native transport port.
thrift_port
Port used to connect to storage thrift server. The default setting is 9160. This information will be sent by opscenterd for convenience, but can be configured locally as needed.
cassandra_user
The username used to connect to storage cassandra when authentication is enabled.
cassandra_pass
The password used to connect to storage cassandra when authentication is enabled.
jmx_host
Host used to connect to local JMX server. Default: 127.0.0.1.
jmx_port
Port used to connect to local JMX server. Default: 7199.
jmx_user
The username used to connect to the local JMX server if JMX authentication is enabled on the node.
jmx_pass
The password used to connect to the local JMX server if JMX authentication is enabled on the node.
cassandra_conf
The agent attempts to auto-detect the location of the cassandra.yaml file via JMX, but if it cannot, this option must be set to the full path of cassandra.yaml. By default /etc/cassandra/cassandra.yaml on Installer-Services or package installations or /path/to/install/conf/cassandra.yaml on Installer-No Services or tarball installations.
cassandra_install_location
The base directory where DataStax Enterprise or Cassandra is installed. When not set, the agent attempts to auto-detect the location but cannot do so in all cases.
cassandra_log_location
The location of the Cassandra system.log file. This option is only used for the diagnostics tarball, and should only be set if system.log is in a non-standard location.
cassandra_rpc_interface
When unspecified, the agent will attempt to determine cassandra rpc_address by reading cassandra.yaml for rpc_address. When specified, this agent lookup is skipped and the specified value is used instead.
poll_period
The length of time in seconds between attempts to collect metrics. Default: 60.
disk_usage_update_period
The length of time in seconds to wait between attempts to poll the disk for usage. Default: 60.
jmx_thread_pool_size
The size of the thread pool used for long-running JMX connections. Default: 5.
jmx_operations_pool_size
The size of the JMX connection pool used for JMX operations. Default: 4.
jmx_retry_timeout
The number of retries to attempt while establishing the cassandra host id. Default: 30.
nodedetails_threadpool_size
The size of the thread pool used to obtain node details. Default: 3.
realtime_interval
The length of time in seconds between polling attempts to capture rapidly changing real-time information. Default: 5.
shorttime_interval
The length of time in seconds between polling attempts to capture information that changes frequently (e.g., OS load, data size, running tasks). Default: 10.
longtime_interval
The length of time in seconds between polling attempts to capture information that changes infrequently. Default: 300.
ec2_metadata_api_host
The EC2 metadata api host used to determine information about the node if it is on EC2. Default: 169.254.169.254.
metrics_enabled
Whether to collect and store metrics for the local node. Setting this option to 0 turns off metrics collection. Default: 1.
jmx_metrics_threadpool_size
The size of the thread pool used for collecting metrics over JMX. Default: 4.
metrics_ignored_keyspaces
A comma-separated list of keyspaces ignored by metrics collection. Example: "ks1, ks2, ks3".
metrics_ignored_column_families
A comma-separated list of tables (formerly referred to as column families) ignored by metrics collection. Example: "ks1.cf1, ks1.cf2, ks2.cf1".
metrics_ignored_solr_cores
A comma-separated list of solr cores ignored by metrics collection. Example: "ks1.cf1, ks1.cf2, ks2.cf1".
async_queue_size
The maximum number of queued cassandra operations. If your cluster experiences bursty cassandra stress, increasing this queue size might help. Default: 5000.
async_pool_size
The pool size to use for async operations to cassandra. Default when using local storage: 2. Default when using remote storage: 4.
max_pending_repairs
The maximum number of repairs that might be pending. Exceeding this number blocks new repairs. Default: 5.
ssl_keystore
The SSL keystore location for the storage cluster that agents to use to connect to CQL.
ssl_keystore_password
The SSL keystore password for the storage cluster that agents to use to connect to CQL.
monitored_cassandra_port
Port used to connect to the monitored cassandra node. The native transport port.
monitored_thrift_port
Port used to connect to monitored thrift server. The default setting is 9160. This information will be sent by opscenterd for convenience, but can be configured locally as needed.
monitored_cassandra_user
The username used to connect to monitored cassandra when authentication is enabled.
monitored_cassandra_pass
The password used to connect to monitored cassandra when authentication is enabled.
monitored_ssl_keystore
The SSL keystore location for the monitored cluster that agents to use to connect to CQL.
monitored_ssl_keystore_password
The SSL keystore password for the monitored cluster that agents to use to connect to CQL.
kerberos_service
The Kerberos service name to use when using Kerberos authentication within DSE.
storage_keyspace
The keyspace that the agent uses to store data.
runs_sudo
Sets whether the DataStax Agent runs using sudo. Setting this option to false means the agent does not use sudo, and the agent user does not run using elevated privileges. Setting this option to true means the agent runs with elevated privileges using sudo.
restore_req_update_period
The frequency in seconds with which status updates are sent to opscenterd during Restore operations in the Backup Service. Default: 60.
backup_staging_dir
The directory used for staging commitlogs to be backed up.
tmp_dir
The location of the Backup Service staging directory for backups. The default location is /var/lib/datastax-agent/tmp.
remote_backup_retries
The number of attempts to make when file download fails during a restore. Default: 3.
remote_backup_timeout
The timeout in milliseconds for the connection used to push backups to remote destinations. Default: 1000.
remote_backup_retry_delay
The delay in milliseconds between remote backup retries. Default: 5000.
remote_verify_initial_delay
Initial delay in milliseconds to wait before checking if a file was successfully uploaded during a backup operation. This configuration option works in conjunction with the remote_verify_max option to distinguish between broken versus tardy backups when cleaning up SSTables. The remote_verify_initial_delay value doubles each time a file transfer validation failure occurs until the value exceeds the remote_verify_max value. Default: 1000 (1 second).
remote_verify_max
The maximum time period in milliseconds to wait after a file upload has completed but is still unreadable from the remote destination. When this delay is exceeded, the file transfer is considered failed. This configuration option works in conjunction with the remote_verify_initial_delay option to distinguish between broken versus tardy backups when cleaning up SSTables. Default: 30000 (30 seconds).
restore_on_transfer_failure
When set to true, a failed file transfer from the remote destination does not halt the restore process. A future restore attempt uses any successfully transferred files. Default: false.
backup_file_queue_max
The maximum number of files that can be queued for an upload to a remote destination. Increasing this number consumes more memory. Default: 10000.
max_file_transfer_attempts
The maximum number of attempts to upload a file or create a remote destination. Default: 30.
trace_delay
The time in milliseconds to wait between issuing a query to trace and fetching trace events in the Performance Service Slow Query panel. Default: 300.
multipart-chunk-size
The chunk size used for ec2 s3 file transfers in bytes.
max-seconds-to-sleep
When stream throttling is configured in Backup Service transfers to or from a remote destination, this setting acts as a cap on how long to sleep when throttling. The cap prevents prematurely closing connections due to inactivity. Default: 25 (seconds).
seconds-to-read-kill-channel
Delay in seconds used to verify that the Backup Service transfer should stop. Default: 0.005.
read-buffer-size
The buffer size to read off the disk. Increasing this number might improve transfer speed but will consume more memory.
write-buffer-size
The buffer size to write to the remote destination. Increasing this number might improve transfer speed, but will limit the ability of the throttler to slow transfers.
unthrottled-default
A very large number used for bytes per second if no throttle is selected. Default: 10000000000.
slow_query_past
How far into the past in milliseconds to look for slow queries. Default: 3600000 (1,000 hours).
slow_query_refresh
Time in seconds between slow query refreshes. Default: 5.
slow_query_fetch_size
The limit to how many slow queries are fetched. Default: 2000.
slow_query_ignore
A list of keyspaces to ignore in the slow query log of the Performance Service. Default: ["OpsCenter" "dse_perf"].
config_encryption_active
Specifies whether opscenter should attempt to decrypt sensitive config values. Default: False.
config_encryption_key_name
Filename to use for the encryption key. If a custom name is not specified, opsc_system_key is used by default.
config_encryption_key_path
Path where the encryption key should be located. If unspecified, the directory of address.yaml is used by default.
max_reconnect_time
Maximum delay in ms for an agent to attempt reconnecting to Cassandra. The default is 15000 ms (15 s).