DataStax Agent configuration
Configure DataStax agents with options in the address.yaml file. Fix and troubleshoot agent connections.
Agent auto-connect
If you are adding an existing cluster to OpsCenter, and the nodes do not have the agent installed, OpsCenter displays a Fix Agents link next to the number of nodes. Clicking Fix Agents causes OpsCenter to attempt to install and start the agent on any nodes that do not have a running agent, as described in Automatic installation of DataStax agents and Installing DataStax agents.
- Verify the agent is able to start up without errors on the node. Look through the agent logs for errors.
- Check the settings of the
stomp_interface
andstomp_port
options for each agent to make sure they match and are pointing to the OpsCenter host and port, respectively. - Test that the OpsCenter host and port are accessible from the node. A firewall might be blocking access to the OpsCenter port, for example.
- Verify that agents have the correct configuration settings for SSL communication if OpsCenter is configured to use SSL communication between itself and the agents.
- Test that the SSH credentials you have entered in OpsCenter work on each node.
- Verify JMX connectivity is enabled on the node.
In most environments, stomp_interface
is the only property that will need
to be explicitly configured, and this might happen automatically as previously mentioned.
You can set most of these properties in the [agent_config]
section of
cluster_name.conf on the opscenterd machine and the
properties propagate automatically to all agents. Some properties or some cases will require
setting these properties directly in address.yaml on applicable
agents.
The address.yaml configuration file
The address.yaml file contains configuration options for the DataStax Agent.
hosts
option in
address.yaml now determines which nodes the agent connects to. For
further information on configuration changes and migration paths, see the Upgrade Guide.address.yaml
The location of the address.yaml file depends on the type of installation:
- Installer-Services or package installations: /var/lib/datastax-agent/conf/address.yaml
- Installer-No Services or tarball installations: install_location/conf/address.yaml
opscenterd.conf
The location of the opscenterd.conf file depends on the type of installation:
- Installer-Services or package installations: /etc/opscenter/opscenterd.conf
- Installer-No Services or tarball installations: install_location/conf/opscenterd.conf
- Windows installations: Program Files (x86)\DataStax Community\opscenter\conf\opscenterd.conf
cluster_name.conf
The location of the cluster_name.conf file depends on the type of installation:
- Installer-Services or package installations: /etc/opscenter/clusters/cluster_name.conf
- Installer-No Services or tarball installations: install_location/conf/clusters/cluster_name.conf
- Windows installations: Program Files (x86)\DataStax Community\opscenter\conf\clusters\cluster_name.conf
Configuration options
- stomp_interface
- (Required) Reachable IP address of the opscenterd machine. The connection is made on stomp_port.
- stomp_port
- The stomp_port used by opscenterd. Default: 61620.
- use_ssl
- Whether to use SSL communication between the agent and opscenterd. Affects both the STOMP connection and agent HTTP server. Corresponds to [agents] use_ssl in opscenterd.conf. Setting this option to 1 turns on SSL connections. Default: 0.
- local_interface
- The IP used to identify the node in opscenterd. If broadcast_address is set in cassandra.yaml, this should be the same as that; otherwise, it is typically the same as listen_address in cassandra.yaml. A good check is to confirm that this address is the same as the address that nodetool ring outputs. If not set, the agent attempts to determine the proper IP address via JMX.
- agent_rpc_interface
- The IP that the agent HTTP server listens on. In a multiple region deployment, this is typically a private IP. Default: Matches rpc_interface from cassandra.yaml.
- agent_rpc_broadcast_address
- The IP that the central OpsCenter process uses to connect to the DataStax agent. Default: Matches agent_rpc_interface.
- api_port
- The port used for the agent's HTTP endpoint. Default: 61621.
- hosts
- The DataStax Enterprise node or nodes responsible for storing OpsCenter data. By
default, this will be the local node, but may be configured to store data on a separate
cluster. The hosts option accepts an array of strings specifying the IP
addresses of the node or nodes. For example,
["1.2.3.4"]
or["1.2.3.4", "1.2.3.5"]
.
- cassandra_port
- Port used to connect to the storage cassandra node. The native transport port.
- thrift_port
- Port used to connect to storage thrift server. The default setting is 9160. This information will be sent by opscenterd for convenience, but can be configured locally as needed.
- cassandra_user
- The username used to connect to storage cassandra when authentication is enabled.
- cassandra_pass
- The password used to connect to storage cassandra when authentication is enabled.
- jmx_host
- Host used to connect to local JMX server. Default: 127.0.0.1.
- jmx_port
- Port used to connect to local JMX server. Default: 7199.
- jmx_user
- The username used to connect to the local JMX server if JMX authentication is enabled on the node.
- jmx_pass
- The password used to connect to the local JMX server if JMX authentication is enabled on the node.
- cassandra_conf
- The agent attempts to auto-detect the location of the cassandra.yaml file via JMX, but if it cannot, this option must be set to the full path of cassandra.yaml. By default /etc/cassandra/cassandra.yaml on Installer-Services or package installations or /path/to/install/conf/cassandra.yaml on Installer-No Services or tarball installations.
- cassandra_install_location
- The base directory where DataStax Enterprise or Cassandra is installed. When not set, the agent attempts to auto-detect the location but cannot do so in all cases.
- cassandra_log_location
- The location of the Cassandra system.log file. This option is only used for the diagnostics tarball, and should only be set if system.log is in a non-standard location.
- cassandra_rpc_interface
- When unspecified, the agent will attempt to determine cassandra rpc_address by reading cassandra.yaml for rpc_address. When specified, this agent lookup is skipped and the specified value is used instead.
- poll_period
- The length of time in seconds between attempts to collect metrics. Default: 60.
- disk_usage_update_period
- The length of time in seconds to wait between attempts to poll the disk for usage. Default: 60.
- jmx_thread_pool_size
- The size of the thread pool used for long-running JMX connections. Default: 5.
- jmx_operations_pool_size
- The size of the JMX connection pool used for JMX operations. Default: 4.
- jmx_retry_timeout
- The number of retries to attempt while establishing the cassandra host id. Default: 30.
- nodedetails_threadpool_size
- The size of the thread pool used to obtain node details. Default: 3.
- realtime_interval
- The length of time in seconds between polling attempts to capture rapidly changing real-time information. Default: 5.
- shorttime_interval
- The length of time in seconds between polling attempts to capture information that changes frequently (e.g., OS load, data size, running tasks). Default: 10.
- longtime_interval
- The length of time in seconds between polling attempts to capture information that changes infrequently. Default: 300.
- ec2_metadata_api_host
- The EC2 metadata api host used to determine information about the node if it is on EC2. Default: 169.254.169.254.
- metrics_enabled
- Whether to collect and store metrics for the local node. Setting this option to 0 turns off metrics collection. Default: 1.
- jmx_metrics_threadpool_size
- The size of the thread pool used for collecting metrics over JMX. Default: 4.
- metrics_ignored_keyspaces
- A comma-separated list of keyspaces ignored by metrics collection. Example: "ks1, ks2, ks3".
- metrics_ignored_column_families
- A comma-separated list of tables (formerly referred to as column families) ignored by metrics collection. Example: "ks1.cf1, ks1.cf2, ks2.cf1".
- metrics_ignored_solr_cores
- A comma-separated list of solr cores ignored by metrics collection. Example: "ks1.cf1, ks1.cf2, ks2.cf1".
- async_queue_size
- The maximum number of queued cassandra operations. If your cluster experiences bursty cassandra stress, increasing this queue size might help. Default: 5000.
- async_pool_size
- The pool size to use for async operations to cassandra. Default when using local storage: 2. Default when using remote storage: 4.
- max_pending_repairs
- The maximum number of repairs that might be pending. Exceeding this number blocks new repairs. Default: 5.
- ssl_keystore
- The SSL keystore location for the storage cluster that agents to use to connect to CQL.
- ssl_keystore_password
- The SSL keystore password for the storage cluster that agents to use to connect to CQL.
- monitored_cassandra_port
- Port used to connect to the monitored cassandra node. The native transport port.
- monitored_thrift_port
- Port used to connect to monitored thrift server. The default setting is 9160. This information will be sent by opscenterd for convenience, but can be configured locally as needed.
- monitored_cassandra_user
- The username used to connect to monitored cassandra when authentication is enabled.
- monitored_cassandra_pass
- The password used to connect to monitored cassandra when authentication is enabled.
- monitored_ssl_keystore
- The SSL keystore location for the monitored cluster that agents to use to connect to CQL.
- monitored_ssl_keystore_password
- The SSL keystore password for the monitored cluster that agents to use to connect to CQL.
- kerberos_service
- The Kerberos service name to use when using Kerberos authentication within DSE.
- storage_keyspace
- The keyspace that the agent uses to store data.
- runs_sudo
- Sets whether the DataStax Agent runs using sudo. Setting this option to false means the agent does not use sudo, and the agent user does not run using elevated privileges. Setting this option to true means the agent runs with elevated privileges using sudo.
- restore_req_update_period
- The frequency in seconds with which status updates are sent to opscenterd during Restore operations in the Backup Service. Default: 60.
- backup_staging_dir
- The directory used for staging commitlogs to be backed up.
- tmp_dir
- The location of the Backup Service staging directory for backups. The default location is /var/lib/datastax-agent/tmp.
- remote_backup_retries
- The number of attempts to make when file download fails during a restore. Default: 3.
- remote_backup_timeout
- The timeout in milliseconds for the connection used to push backups to remote destinations. Default: 1000.
- remote_backup_retry_delay
- The delay in milliseconds between remote backup retries. Default: 5000.
- remote_verify_initial_delay
- Initial delay in milliseconds to wait before checking if a file was successfully
uploaded during a backup operation. This configuration option works in conjunction with
the
remote_verify_max
option to distinguish between broken versus tardy backups when cleaning up SSTables. Theremote_verify_initial_delay
value doubles each time a file transfer validation failure occurs until the value exceeds theremote_verify_max
value. Default: 1000 (1 second).
- remote_verify_max
- The maximum time period in milliseconds to wait after a file upload has completed but
is still unreadable from the remote destination. When this delay is exceeded, the file
transfer is considered failed. This configuration option works in conjunction with the
remote_verify_initial_delay
option to distinguish between broken versus tardy backups when cleaning up SSTables. Default: 30000 (30 seconds).
- restore_on_transfer_failure
- When set to true, a failed file transfer from the remote destination does not halt the restore process. A future restore attempt uses any successfully transferred files. Default: false.
- backup_file_queue_max
- The maximum number of files that can be queued for an upload to a remote destination. Increasing this number consumes more memory. Default: 10000.
- max_file_transfer_attempts
- The maximum number of attempts to upload a file or create a remote destination. Default: 30.
- trace_delay
- The time in milliseconds to wait between issuing a query to trace and fetching trace events in the Performance Service Slow Query panel. Default: 300.
- multipart-chunk-size
- The chunk size used for ec2 s3 file transfers in bytes.
- max-seconds-to-sleep
- When stream throttling is configured in Backup Service transfers to or from a remote destination, this setting acts as a cap on how long to sleep when throttling. The cap prevents prematurely closing connections due to inactivity. Default: 25 (seconds).
- seconds-to-read-kill-channel
- Delay in seconds used to verify that the Backup Service transfer should stop. Default: 0.005.
- read-buffer-size
- The buffer size to read off the disk. Increasing this number might improve transfer speed but will consume more memory.
- write-buffer-size
- The buffer size to write to the remote destination. Increasing this number might improve transfer speed, but will limit the ability of the throttler to slow transfers.
- unthrottled-default
- A very large number used for bytes per second if no throttle is selected. Default: 10000000000.
- slow_query_past
- How far into the past in milliseconds to look for slow queries. Default: 3600000 (1,000 hours).
- slow_query_refresh
- Time in seconds between slow query refreshes. Default: 5.
- slow_query_fetch_size
- The limit to how many slow queries are fetched. Default: 2000.
- slow_query_ignore
- A list of keyspaces to ignore in the slow query log of the Performance Service. Default: ["OpsCenter" "dse_perf"].
- config_encryption_active
- Specifies whether opscenter should attempt to decrypt sensitive config values. Default: False.
- config_encryption_key_name
- Filename to use for the encryption key. If a custom name is not specified, opsc_system_key is used by default.
- config_encryption_key_path
- Path where the encryption key should be located. If unspecified, the directory of address.yaml is used by default.
- max_reconnect_time
- Maximum delay in ms for an agent to attempt reconnecting to Cassandra. The default is 15000 ms (15 s).