Configuring for very large clusters
OpsCenter can manage very large clusters that contain hundreds of nodes. When working with very large clusters, the performance of OpsCenter decreases with the default settings. To improve performance, adjust the cluster settings to increase the time period between polls of a cluster’s nodes and token lists.
OpsCenter performs best when monitoring 200 nodes or less. DataStax recommends a dedicated OpsCenter instance for each cluster with more than 200 nodes.
Mission Control is the successor to OpsCenter and was designed with enterprise scale in mind. Consider evaluating Mission Control if you are hitting the limits of OpsCenter. For information about Mission Control, see Intro to Mission Control.
Lifecycle Manager can provision and manage nodes per cluster within its UI. See Supported capabilities for more details.
Change default settings to improve the performance of very large clusters
-
Locate the
cluster_name.conf
file. The location of this file depends on the type of installation:-
Package installations: /etc/opscenter/clusters/cluster_name.conf
-
Tarball installations: install_location/conf/clusters/cluster_name.conf
-
-
Open
cluster_name.conf
for editing.-
Increase the node list poll period to 30 minutes by setting the
nodelist_poll_period
option to 1800 under[collection]
:-
[collection] nodelist_poll_period
The interval in seconds OpsCenter waits to poll the nodes in a cluster. The default value is 30.
[collection] nodelist_poll_period = 1800
-
-
If an agent is overloaded, increase the default
http_timeout
if necessary:-
[agents] http_timeout
The timeout, in seconds, for an HTTP call to the agent. The default value is 10.
[agents] http_timeout = 20
-
-
-
Set the following properties in the
[agent_config]
section ofcluster_name.conf
on theopscenterd
machine and the properties propagate automatically to all DataStax Agents.Conversely, setting these options in
address.yaml
must be done for every node.Increasing the
[agent_config]
values reduces the speed at which the UI displays updates of node status, such as the Ring view.-
[agent_config] realtime_interval
The length of time, in seconds, between polling attempts to capture rapidly changing
realtime
information. Default value: 5[agent_config] realtime_interval = 40
-
[agent_config] shorttime_interval
The length of time, in seconds, between polling attempts to capture information that changes frequently. Default value: 10
[agent_config] shorttime_interval = 80
-
[agent_config] longtime_interval
The length of time, in seconds, between polling attempts to capture information that changes infrequently. Default value: 300
[agent_config] longtime_interval = 2400
-
[agent_config] status_reporting_interval
The length of time, in seconds, between sending agent health information. Default value: 20
[agent_config] status_reporting_interval = 160
-
[agent_config] disk_usage_update_period
The length of time, in seconds, to wait between attempts to poll the disk for usage. Agents send storage capacity data to
opscenterd
based on the disk_usage_update_period value inaddress.yaml
or incluster_name.conf
. Default value: 30[agent_config] disk_usage_update_period = 240
-
-
Locate the
opscenterd.conf
file. The location of this file depends on the type of installation:-
Package installations: /etc/opscenter/opscenterd.conf
-
Tarball installations: install_location/conf/opscenterd.conf
-
-
Open
opscenterd.conf
for editing and adjust the following settings:-
[agents] not_seen_threshold
The maximum time in seconds since the last agent status about a specific connection, such as
stomp
, was sent before that agent connection is considered down. This threshold also affects how long OpsCenter waits before marking node health as unknown. Default value: 180 seconds. -
[agents] http_poll_period
The frequency in seconds between attempts to poll agent http health. Default value: 60 seconds.
-
[ui] default_api_timeout
The default timeout value in seconds for an API call from the OpsCenter UI to the OpsCenter API. Default value: 30 seconds. Some API calls require a timeout longer than 30 seconds. In those cases, the API call timeouts are scaled relative to the
default_api_timeout
(for example, 6 *default_api_timeout
). Changing thedefault_api_timeout
affects those timeouts accordingly. For example:[agents] not_seen_threshold = 620 http_poll_period = 500 [ui] default_api_timeout = 60
-
-
Locate the
address.yaml
file. The location of this file depends on the type of installation:-
Package installations: /var/lib/datastax-agent/conf/address.yaml
-
Tarball installations: install_location/conf/address.yaml
-
-
Open
address.yaml
for editing, and adjust the following setting to override the default of 30 seconds:-
stomp-setup-timeout
The maximum time in seconds before the agents try to reconnect via stomp. Increasing this timeout to 120 seconds better allows agents to reconnect after restarting OpsCenter in a large cluster.
stomp-setup-timeout: 120
-
-
Use the environmental variable
OPSC_JVM_OPTS
to override the default parameters for the OpsCenter JVM.The following command doubles the heap size to
4096m
(4GB).export OPSC_JVM_OPTS=-Xmx4096m
See Configuring the OpsCenter JVM for additional information.
-
If you continually receive
OutOfMemory
errors, consider Configuring the DataStax Agent JVM.