Configure very large clusters
OpsCenter can manage very large clusters that contain hundreds of nodes. When working with very large clusters, the performance of OpsCenter decreases with the default settings. To improve performance, adjust the cluster settings to increase the time period between polls of a cluster’s nodes and token lists.
OpsCenter performs best when monitoring 200 nodes or less. DataStax recommends a dedicated OpsCenter instance for each cluster with more than 200 nodes.
Mission Control is the successor to OpsCenter and was designed with enterprise scale in mind. Consider evaluating Mission Control if you are hitting the limits of OpsCenter. For information about Mission Control, see Intro to Mission Control.
Lifecycle Manager can provision and manage nodes per cluster within its UI. See Supported capabilities for more details.
To change default settings to improve the performance of very large clusters, do the following:
- 
Edit the
cluster_name.conffile. The location of this file depends on the type of installation:- 
Package installations:
/etc/opscenter/clusters/cluster_name.conf - 
Tarball installations:
INSTALL_DIRECTORY/conf/clusters/cluster_name.conf 
 - 
 - 
Increase the node list poll period to 30 minutes by setting the
nodelist_poll_periodoption to 1800 under[collection].This parameter sets the interval in seconds that OpsCenter waits to poll the nodes in a cluster. The default value is 30 seconds.
[collection] nodelist_poll_period = 1800 - 
If an agent is overloaded, increase the default
http_timeoutif necessary.This parameter sets the timeout in seconds for an HTTP call to the agent. The default value is 10 seconds.
[agents] http_timeout = 20 - 
Set properties in the
[agent_config]section ofcluster_name.confon theopscenterdmachine. The properties propagate automatically to all DataStax Agents.Conversely, if you set these options in
address.yaml, you must do so for every node.Increasing the
[agent_config]values reduces the speed at which the UI shows updates of node status, such as the Ring view.- 
[agent_config] realtime_interval: The length of time, in seconds, between polling attempts to capture rapidly changing
realtimeinformation. Default value: 5[agent_config] realtime_interval = 40 - 
[agent_config] shorttime_interval: The length of time, in seconds, between polling attempts to capture information that changes frequently. Default value: 10
[agent_config] shorttime_interval = 80 - 
[agent_config] longtime_interval: The length of time, in seconds, between polling attempts to capture information that changes infrequently. Default value: 300
[agent_config] longtime_interval = 2400 - 
[agent_config] status_reporting_interval: The length of time, in seconds, between sending agent health information. Default value: 20
[agent_config] status_reporting_interval = 160 - 
[agent_config] disk_usage_update_period: The length of time, in seconds, to wait between attempts to poll the disk for usage. Agents send storage capacity data to
opscenterdbased on the disk_usage_update_period value inaddress.yamlor incluster_name.conf. Default value: 30[agent_config] disk_usage_update_period = 240 
 - 
 - 
Edit the
opscenterd.conffile. The location of this file depends on the type of installation:- 
Package installations:
/etc/opscenter/opscenterd.conf - 
Tarball installations:
INSTALL_DIRECTORY/conf/opscenterd.conf 
 - 
 - 
Adjust the following settings:
- 
[agents] not_seen_threshold: The maximum time in seconds since the last agent status about a specific connection, such as
stomp, was sent before that agent connection is considered down. This threshold also affects how long OpsCenter waits before marking node health as unknown. Default value: 180 seconds. - 
[agents] http_poll_period: The frequency in seconds between attempts to poll agent http health. Default value: 60 seconds.
 - 
[ui] default_api_timeout: The default timeout value in seconds for an API call from the OpsCenter UI to the OpsCenter API. Default value: 30 seconds. Some API calls require a timeout longer than 30 seconds. In those cases, the API call timeouts are scaled relative to the
default_api_timeout(for example, 6 *default_api_timeout). Changing thedefault_api_timeoutaffects those timeouts accordingly. For example:[agents] not_seen_threshold = 620 http_poll_period = 500 [ui] default_api_timeout = 60 
 - 
 - 
Edit the
address.yamlfile. The location of this file depends on the type of installation:- 
Package installations:
/var/lib/datastax-agent/conf/address.yaml - 
Tarball installations:
INSTALL_DIRECTORY/conf/address.yaml 
 - 
 - 
Adjust the
stomp-setup-timeoutsetting to override the default of 30 seconds.This parameter sets the maximum time in seconds before the agents try to reconnect with stomp. Increasing this timeout to 120 seconds better allows agents to reconnect after restarting OpsCenter in a large cluster.
stomp-setup-timeout: 120 - 
Use the environmental variable
OPSC_JVM_OPTSto override the default parameters for the OpsCenter JVM. For example, the following command doubles the heap size to4096m(4GB):export OPSC_JVM_OPTS=-Xmx4096mSee Configuring the OpsCenter JVM for additional information.
 - 
If you continually receive
OutOfMemoryerrors, consider Configure the DataStax Agent JVM.