Configuring for very large clusters

OpsCenter can manage very large clusters that contain hundreds of nodes. When working with very large clusters, the performance of OpsCenter decreases with the default settings. To improve performance, adjust the cluster settings to increase the time period between polls of a cluster’s nodes and token lists.

OpsCenter performs best when monitoring 200 nodes or less. DataStax recommends a dedicated OpsCenter instance for each cluster with more than 200 nodes.

Mission Control is the successor to OpsCenter and was designed with enterprise scale in mind. Consider evaluating Mission Control if you are hitting the limits of OpsCenter. For information about Mission Control, see Intro to Mission Control.

Lifecycle Manager can provision and manage nodes per cluster within its UI. See Supported capabilities for more details.

Change default settings to improve the performance of very large clusters

  1. Locate the cluster_name.conf file. The location of this file depends on the type of installation:

    • Package installations: /etc/opscenter/clusters/cluster_name.conf

    • Tarball installations: install_location/conf/clusters/cluster_name.conf

  2. Open cluster_name.conf for editing.

    1. Increase the node list poll period to 30 minutes by setting the nodelist_poll_period option to 1800 under [collection]:

      • [collection] nodelist_poll_period

        The interval in seconds OpsCenter waits to poll the nodes in a cluster. The default value is 30.

        [collection]
        nodelist_poll_period = 1800
    2. If an agent is overloaded, increase the default http_timeout if necessary:

      • [agents] http_timeout

        The timeout, in seconds, for an HTTP call to the agent. The default value is 10.

        [agents]
        http_timeout = 20
  3. Set the following properties in the [agent_config] section of cluster_name.conf on the opscenterd machine and the properties propagate automatically to all DataStax Agents.

    Conversely, setting these options in address.yaml must be done for every node.

    Increasing the [agent_config] values reduces the speed at which the UI displays updates of node status, such as the Ring view.

    • [agent_config] realtime_interval

      The length of time, in seconds, between polling attempts to capture rapidly changing realtime information. Default value: 5

      [agent_config]
      realtime_interval = 40
    • [agent_config] shorttime_interval

      The length of time, in seconds, between polling attempts to capture information that changes frequently. Default value: 10

      [agent_config]
      shorttime_interval = 80
    • [agent_config] longtime_interval

      The length of time, in seconds, between polling attempts to capture information that changes infrequently. Default value: 300

      [agent_config]
      longtime_interval = 2400
    • [agent_config] status_reporting_interval

      The length of time, in seconds, between sending agent health information. Default value: 20

      [agent_config]
      status_reporting_interval = 160
    • [agent_config] disk_usage_update_period

      The length of time, in seconds, to wait between attempts to poll the disk for usage. Agents send storage capacity data to opscenterd based on the disk_usage_update_period value in address.yaml or in cluster_name.conf. Default value: 30

      [agent_config]
      disk_usage_update_period = 240
  4. Locate the opscenterd.conf file. The location of this file depends on the type of installation:

    • Package installations: /etc/opscenter/opscenterd.conf

    • Tarball installations: install_location/conf/opscenterd.conf

  5. Open opscenterd.conf for editing and adjust the following settings:

    • [agents] not_seen_threshold

      The maximum time in seconds since the last agent status about a specific connection, such as stomp, was sent before that agent connection is considered down. This threshold also affects how long OpsCenter waits before marking node health as unknown. Default value: 180 seconds.

    • [agents] http_poll_period

      The frequency in seconds between attempts to poll agent http health. Default value: 60 seconds.

    • [ui] default_api_timeout

      The default timeout value in seconds for an API call from the OpsCenter UI to the OpsCenter API. Default value: 30 seconds. Some API calls require a timeout longer than 30 seconds. In those cases, the API call timeouts are scaled relative to the default_api_timeout (for example, 6 * default_api_timeout). Changing the default_api_timeout affects those timeouts accordingly. For example:

      [agents]
      not_seen_threshold = 620
      http_poll_period = 500
      
      [ui]
      default_api_timeout = 60
  6. Locate the address.yaml file. The location of this file depends on the type of installation:

    • Package installations: /var/lib/datastax-agent/conf/address.yaml

    • Tarball installations: install_location/conf/address.yaml

  7. Open address.yaml for editing, and adjust the following setting to override the default of 30 seconds:

    • stomp-setup-timeout

      The maximum time in seconds before the agents try to reconnect via stomp. Increasing this timeout to 120 seconds better allows agents to reconnect after restarting OpsCenter in a large cluster.

      stomp-setup-timeout: 120
  8. Use the environmental variable OPSC_JVM_OPTS to override the default parameters for the OpsCenter JVM.

    The following command doubles the heap size to 4096m (4GB).

    export OPSC_JVM_OPTS=-Xmx4096m

    See Configuring the OpsCenter JVM for additional information.

  9. If you continually receive OutOfMemory errors, consider Configuring the DataStax Agent JVM.

  10. Restart OpsCenter.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com