Configuring for very large clusters

OpsCenter can manage clusters containing multiple hundreds of nodes. When managing very large clusters with up to 1000 nodes, adjusting cluster configuration settings improves performance.

cluster_name.conf 

The location of the cluster_name.conf file depends on the type of installation:

  • Package installations: /etc/opscenter/clusters/cluster_name.conf
  • Tarball installations: install_location/conf/clusters/cluster_name.conf

opscenterd.conf 

The location of the opscenterd.conf file depends on the type of installation:

  • Package installations: /etc/opscenter/opscenterd.conf
  • Tarball installations: install_location/conf/opscenterd.conf
OpsCenter can manage very large clusters up to 1000 nodes.
Note: Lifecycle Manager can provision and manage up to 300 nodes per cluster within its UI. See How many nodes can Lifecycle Manager support when creating DataStax Enterprise clusters? for more details.

When working with very large clusters, the performance of OpsCenter decreases with the default settings. To improve performance, adjust the cluster settings to increase the time period between polls of a cluster's nodes and token lists.

After adding a very large cluster to OpsCenter, change the following default settings:

Procedure

  1. Open cluster_name.conf for editing.
    1. Increase the node list poll period to 30 minutes by setting the nodelist_poll_period option to 1800 under [collection]:
      [collection] nodelist_poll_period
      The interval in seconds OpsCenter waits to poll the nodes in a cluster. The default value is 30.
      [collection]
                                      nodelist_poll_period = 1800
    2. If an agent is overloaded, increase the default http_timeout if necessary:
      [agents] http_timeout
      The timeout, in seconds, for an HTTP call to the agent. The default value is 10.
      [agents]
                                      http_timeout = 20
  2. Open opscenterd.conf for editing and adjust the following settings:
    [agents] not_seen_threshold
    The maximum time in seconds since the last agent status about a specific connection, such as stomp, was sent before that agent connection is considered down. This threshold also affects how long OpsCenter waits before marking node health as unknown. Default value: 180 seconds.
    [agents] http_poll_period
    The frequency in seconds between attempts to poll agent http health. Default value: 60 seconds.
    [ui] default_api_timeout
    The default timeout value in seconds for an API call from the OpsCenter UI to the OpsCenter API. The default value is 10.
    [agents]
                            not_seen_threshold = 620
                            http_poll_period = 500
                            
                            [ui]
                            default_api_timeout = 60
  3. Restart OpsCenter.