Configuring for very large clusters

OpsCenter can manage clusters containing multiple hundreds of nodes. When managing very large clusters with up to 1000 nodes, adjusting cluster configuration settings improves performance.

OpsCenter can manage very large clusters up to 1000 nodes.

Note: Lifecycle Manager can provision and manage up to 300 nodes per cluster within its UI. See Supported capabilities for more details.

When working with very large clusters, the performance of OpsCenter decreases with the default settings. To improve performance, adjust the cluster settings to increase the time period between polls of a cluster's nodes and token lists.

After adding a very large cluster to OpsCenter, change the following default settings:

opscenterd.conf

The location of the opscenterd.conf file depends on the type of installation:

Package installations: /etc/opscenter/opscenterd.conf
Tarball installations: install_location/conf/opscenterd.conf

cluster_name.conf

The location of the cluster_name.conf file depends on the type of installation:

Package installations: /etc/opscenter/clusters/cluster_name.conf
Tarball installations: install_location/conf/clusters/cluster_name.conf

Procedure

Open cluster_name.conf for editing.
1. Increase the node list poll period to 30 minutes by setting the nodelist_poll_period option to 1800 under [collection]:
  
  [collection] nodelist_poll_period
  
  The interval in seconds OpsCenter waits to poll the nodes in a cluster. The default value is 30.
```
[collection]
nodelist_poll_period = 1800
```
2. If an agent is overloaded, increase the default http_timeout if necessary:
  
  [agents] http_timeout
  
  The timeout, in seconds, for an HTTP call to the agent. The default value is 10.
```
[agents]
http_timeout = 20
```
Open opscenterd.conf for editing and adjust the following settings:

[agents] not_seen_threshold

The maximum time in seconds since the last agent status about a specific connection, such as stomp, was sent before that agent connection is considered down. This threshold also affects how long OpsCenter waits before marking node health as unknown. Default value: 180 seconds.

[agents] http_poll_period

The frequency in seconds between attempts to poll agent http health. Default value: 60 seconds.

[ui] default_api_timeout

The default timeout value in seconds for an API call from the OpsCenter UI to the OpsCenter API. The default value is 10. Some API calls require a timeout longer than 10 seconds. In those cases, the API call timeouts are scaled relative to the default_api_timeout (for example, 6 * default_api_timeout). Changing the default_api_timeout affects those timeouts accordingly.
```
[agents]
not_seen_threshold = 620
http_poll_period = 500

[ui]
default_api_timeout = 60
```
Use the environmental variable OPSC_JVM_OPTS to override the default parameters for the OpsCenter JVM.
The following command doubles the heap size to 4096m (4GB).
```
export OPSC_JVM_OPTS=-Xmx4096m
```
See Configuring the OpsCenter JVM for additional information.
Optional: If you continually receive OutOfMemory errors, consider Configuring the DataStax Agent JVM.
Restart OpsCenter.