• Glossary
  • Support
  • Downloads
  • DataStax Home
Get Live Help
Expand All
Collapse All

DataStax Enterprise OpsCenter 6.8

    • About OpsCenter
      • New features
      • Key features
      • Labs features
        • Exporting and importing dashboard presets
        • Adding a Swift CLI backup location
        • Configuring named route linking
        • Viewing logs from node details
      • Architecture overview
      • OpsCenter policy for DDAC and OSS
      • Feedback about OpsCenter
    • Release notes
    • Installing OpsCenter
    • Upgrading OpsCenter
    • OpsCenter recommended settings
      • OpsCenter basic configurations
      • Cluster synchronization settings
      • Backup Service settings
      • Knowledge Base articles
    • Configuring OpsCenter
      • OpsCenter Security
        • OpsCenter SSL overview
          • Enabling/Disabling HTTPS for the OpsCenter server
          • Configuring SSL/TLS between OpsCenter and the DataStax Agents
          • Connect to DSE with client-to-node encryption in OpsCenter and the DataStax Agents
          • Editing/OpsCenter cluster connections for authentication or encryption
          • SSL configuration options for OpsCenter
        • Configuring OpsCenter role-based security
        • Encrypting sensitive configuration values
          • Activating configuration encryption
          • Creating a system key to encrypt sensitive configuration values
          • Manually encrypting a configuration value
          • Deactivating configuration encryption
        • Authenticating with LDAP
          • Configuring LDAP
          • Adding a role for an LDAP user
          • Troubleshooting OpsCenter LDAP
        • Kerberos authentication
          • Configuring OpsCenter for Kerberos authentication
          • OpsCenter Kerberos configuration options
          • Troubleshooting Kerberos in OpsCenter
        • Configuring security logging
      • Configuring alerts for events
        • SNMP alerts overview
          • Enabling SNMP alerts
        • Enabling SMTP email alerts
        • Enabling alerts posted to a URL
          • Verifying that events are posting correctly
          • Posting URL alerts to a Slack channel
      • Configuring data collection and expiration
        • Controlling data collection
        • Storing collection data on a separate cluster
      • OpsCenter DSE definitions files updates
        • Updating and configuring definitions files properties
      • Automatic failover overview
        • Enabling automatic failover
        • Failover configuration options reference
      • Backing up critical configuration data
      • Configuring named route linking
      • Configuring the OpsCenter JVM
      • Configuring the DataStax Agent JVM
        • Setting and securing the tmp directory for the DataStax Agent
        • Encrypting JMX communications
      • Changing the replication strategy for the OpsCenter keyspace
      • Configuration files for OpsCenter
        • OpsCenter configuration properties
          • Statistics reporter properties
        • Cluster configuration properties
          • Cassandra connection properties
          • Metrics Collection Properties
        • DataStax Agent configuration
        • OpsCenter logback.xml configuration
      • Customize scripts for starting and stopping DataStax Enterprise
      • Example configuration scenarios
        • Configuring for multiple regions
        • Configuring for very large clusters
    • Using OpsCenter
      • OpsCenter workspace overview
        • Ring View
        • List View
        • DataStax Agents Status View
        • Nodes Detail View
          • Node management operations
      • Configuring alerts
        • Adding an alert for agent issues
          • Troubleshooting DataStax Agent Issues
        • Adding an alert for down nodes
        • Configuring an alert for KMIP errors
        • Configuring an alert for percentage of in-memory usage
        • Configuring an alert for percentiles
      • Monitoring node operations
        • Viewing the Spark Console
        • Monitoring in-memory usage
        • Viewing logs from node details
      • Managing and maintaining nodes
        • Running cleanup
        • Performing garbage collection
        • Running compaction
        • Flushing tables
        • Decommission a node
        • Draining a node
        • Moving a node
        • Running a manual repair
        • Configure an alias for a node
      • Starting and stopping DSE
        • Starting DSE on a node
        • Stopping DSE on a node
        • Restarting DSE on a node
      • Managing keyspaces and tables
        • Keyspaces
          • Managing a keyspace
          • Managing tables
        • Browsing data deprecated
      • Cluster administration
        • Adding an existing cluster
        • Disconnecting a cluster from OpsCenter and Lifecycle Manager
        • Rebalancing a cluster overview
          • Rebalancing a cluster
          • Configuring an alert for rebalancing a cluster
        • Restarting a cluster
        • Changing the display name of a cluster
        • Downloading diagnostic data
          • Diagnostic tarball reference
          • Creating an alternate directory for diagnostic information
        • Downloading Insights diagnostic data
        • Generating a cluster report
      • OpsCenter Metrics Tooltips Reference
        • Dashboard performance metrics
        • Performance metrics overview
          • Working with metrics performance graphs
          • Organizing performance metrics presets
          • Exporting and importing dashboard presets
        • Cluster performance metrics
        • Pending task metrics
          • Pending task metrics for writes
          • Pending task metrics for reads
          • Pending task metrics for cluster operations
        • Table performance metrics
        • Tiered storage performance metrics
          • Configuring tiered storage metric graphs
          • Configuring tiered storage alerts
        • Message latency metrics
          • Adding dashboard graphs for datacenter and node messaging latency
          • Adding alerts for DC and node message latency
        • Search performance metrics
        • Graph metrics
        • NodeSync metrics
        • Thread Pool (TP) metrics
          • Viewing TP stats in Node Details
          • Enabling network backpressure
        • Dropped Messages metrics
        • Operating system performance metrics
        • Alert metrics
          • Advanced system alert metrics
    • OpsCenter 6.8 Reference
      • OpsCenter ports reference
      • Installation and configuration locations
        • Default file locations for package installations
        • Default file locations tarball installations
      • Starting, stopping, and restarting OpsCenter
        • Startup log for OpsCenter
      • Stopping, starting, and restarting DataStax Agents
    • DSE Management Services
      • Backup Service
        • Quick Video Tour: Backup Service
        • Adding a backup location
          • Adding a local file system backup location
          • Adding an Amazon S3 backup location
          • Adding an Azure backup location
        • Backing up data
          • Backing up a cluster
          • Backing up to Amazon S3
          • Backing up and restoring DataStax Graphs in OpsCenter
          • Viewing backup and restore history
          • Synchronizing backup data after an upgrade
          • Deleting backup data
        • Restoring a cluster
          • Restoring from a backup
          • Restoring a backup to a specific point-in-time
          • Monitoring sufficient disk space for restoring backups
        • Cloning cluster data
          • Cloning cluster data from a defined other location
          • Cloning cluster data from clusters managed by the same OpsCenter instance
        • Configuring the Backup Service
          • Configuring commit log backups
          • Configuring the free disk space threshold for backups
          • Configuring encryption key storage for backups
          • Configuring custom scripts to run before and after backups
          • Configuring restore to continue after a download failure
          • Backup Service configuration options
        • Troubleshooting Backup Service errors
      • NodeSync Service
        • Enabling NodeSync
        • Configuring the NodeSync refresh data interval
        • Viewing NodeSync Status
        • Configuring the NodeSync rate using LCM
        • NodeSync metrics
      • Repair Service
        • Repair Service overview
          • Subrange repairs overview
          • Distributed subrange overview
          • Incremental repairs overview
          • Repair Service behavior during environment changes
          • Estimating remaining repair time
        • Turning the Repair Service on
        • Turning the Repair Service off
        • Viewing repair status
        • Basic repair configuration
          • Configuring incremental repairs
          • Excluding keyspaces or tables from subrange repairs
          • Enabling distributed subrange repairs
          • Logging for the Repair Service
          • Basic Repair Service configuration reference
        • Advanced repair configuration
          • Adjusting or disabling the throttle for subrange repairs
          • Running validation compaction sequentially
          • Advanced Repair Service configuration reference
        • Expert repair configuration
          • Setting the maximum for parallel subrange repairs
          • Expert Repair Service configuration reference
          • Tuning Repair Service for multi-datacenter environments
        • Expedited Repair Service configuration
        • Troubleshoot Repair Service errors
        • Learn more about repairs
      • Capacity Service
        • Forecasting trends for metric graphs
        • Advanced forecast configuration
      • Best Practice Service
        • Configuring Best Practice service rules
        • Monitoring the results of Best Practice service scans
        • Best Practice Rules Reference
      • Performance Service
        • Performance Service Overview
        • Why use the OpsCenter Performance Service?
        • Enabling the OpsCenter Performance Service
        • Disabling the OpsCenter Performance Service
        • Setting permissions for the OpsCenter Performance Service
        • Tuning a database cluster with the Performance Service
          • Identifying and tuning slow queries
    • Identifying poorly performing tables
    • Monitoring node thread pool statistics
    • Troubleshooting OpsCenter
    • Lifecycle Manager
      • Overview of Lifecycle Manager
        • Supported capabilities
        • Defining the topology
        • Using configuration profiles
        • Defining repositories
        • Running jobs in LCM
          • Job types in LCM
          • Job concurrency in LCM
        • Monitoring job status
      • Installing DSE using LCM
        • Accessing OpsCenter Lifecycle Manager
        • Creating custom data directories
        • Adding SSH credentials
        • Adding a configuration profile
        • Adding a repository
        • Defining the cluster topology
          • Adding a cluster
          • Adding a datacenter
          • Adding a node
        • Running an installation job
        • Viewing job details
        • Using LCM in an offline environment
          • Required software for offline DSE installs
          • Downloading DSE in an offline environments
      • Managing SSH credentials
        • Adding SSH credentials
        • Editing SSH credentials
        • Deleting SSH credentials
        • Configuring SSH connection thresholds for LCM jobs
      • Managing configuration profiles
        • Adding a configuration profile
        • Editing a configuration profile
        • Customizing configuration profile files
        • Cloning a configuration profile
        • Deleting a configuration profile
        • Configuring an HTTP or HTTPS proxy
      • Configuring repositories
        • Adding a repository
        • Editing a repository
        • Deleting a repository
      • Defining DSE topologies
        • Managing cluster topologies
          • Adding a cluster
          • Editing a cluster
          • Deleting a cluster
          • Importing a cluster topology
        • Managing datacenter topologies
          • Adding a datacenter
          • Editing a datacenter
          • Deleting a datacenter
        • Managing node topologies
          • Adding a node
          • Editing a node
          • Deleting a node
      • Running LCM jobs
        • Running an installation job
        • Running an configure job
        • Running an upgrade job
          • Example: Upgrading DSE to a minor release using LCM
        • Aborting a job
        • Adjusting idle timeout
      • Configuring Java options
        • Choosing a Java vendor in LCM
        • Managing Java installs
        • Configuring JVM options for DSE using LCM
      • Configuring DSE security using LCM
        • Native transport authentication schemes and limitations in LCM
          • Configuring row-level access control
        • Configuring SSL/TLS for DSE
        • Configuring a JMX Connection to DSE
      • Lifecycle Manager configuration options
      • Configuration known issues and limitations
      • Using advanced configurations with LCM
        • Exporting metrics collection
        • Configuring AlwaysOn SQL
        • Configuring DSE Graph
        • Configuring the NodeSync rate
        • Configuring tiered storage
    • OpsCenter API reference for developers
      • Enable and access the Datastax Agent API
      • Authentication
      • OpsCenter configuration
      • Retrieving cluster and node information
      • Performing Cluster Operations
      • Managing Keyspaces and Tables
      • Retrieving Metric Data
      • Managing Events and Alerts
      • Schedule management
      • Backup Management and Restoring from Backups
      • Best Practice Rules
      • Hadoop
      • Spark
      • Managing Performance Service Configuration
      • User Interface
      • Agent Install and Status
      • Cluster Lifecycle Management
      • DataStax Agent API example curl commands
  • DataStax Enterprise OpsCenter 6.8
  • Using OpsCenter
  • OpsCenter Metrics Tooltips Reference
  • Alert metrics

Alert metrics

From the Alerts area of OpsCenter, configure alert thresholds for Cassandra cluster-wide, table, and operating system metrics. This proactive monitoring feature is available for DataStax Enterprise clusters.

Commonly watched alert metrics

Commonly watched metrics are available from the main Notify me when choice menu in the Add Alert dialog.

Metric Definition

Node down

When a node does not respond to requests, OpsCenter marks the node as down. To determine whether a node is down, each agent gets a list of nodes that its node suspects are down based on information from Cassandra returned via JMX. Based on that information, Opscenterd determines whether a node is truly down based on status reported by other nodes, or if a node is simply flapping and erroneously reporting all other nodes down. Nodes marked with a down status are clearly indicated in the Nodes Ring View. For even more awareness and visibility, see adding an alert for down nodes for further instructions.

Write requests

The number of write requests per second. Monitoring the number of writes over a given time period can give you an idea of system write workload and usage patterns.

Write request latency

The response time (in milliseconds) for successful write operations. The time period starts when a node receives a client write request, and ends when the node responds back to the client.

Read requests

The number of read requests per second. Monitoring the number of reads over a given time period can give you an idea of system read workload and usage patterns.

Read request latency

The response time (in milliseconds) for successful read operations. The time period starts when a node receives a client read request, and ends when the node responds back to the client.

CPU usage

The percentage of time that the CPU was busy, which is calculated by subtracting the percentage of time the CPU was idle from 100 percent.

Load

Load is a measure of the amount of work that a computer system performs. An idle computer has a load number of 0 and each process using or waiting for CPU time increments the load number by 1.

Advanced Cassandra alert metrics

To access Advanced Cassandra metrics, choose Advanced > Cassandra in the Add Alert dialog.

Metric Definition

Heap max

The maximum amount of shared memory allocated to the JVM heap for Cassandra processes.

Heap used

The amount of shared memory in use by the JVM heap for Cassandra processes.

JVM CMS collection count

The number of concurrent mark-sweep (CMS) garbage collections performed by the JVM per second.

JVM ParNew collection count

The number of parallel new-generation garbage collections performed by the JVM per second.

JVM CMS collection time

The time spent collecting CMS garbage in milliseconds per second (ms/sec).

JVM ParNew collection time

The time spent performing ParNew garbage collections in ms/sec.

Data size

The size of table data (in gigabytes) that has been loaded/inserted into Cassandra, including any storage overhead and system metadata.

Compactions pending

The number of compaction operations that are queued and waiting for system resources in order to run. The optimal number of pending compactions is 0 (or at most a very small number). A value greater than 0 indicates that read operations are in I/O contention with compaction operations, which usually manifests itself as declining read performance.

Total bytes compacted

The number of SSTable data compacted in bytes per second.

Total compactions

The number of compactions (minor or major) performed per second.

Flush sorter tasks pending

The flush sorter process performs the first step in the overall process of flushing memtables to disk as SSTables. The optimal number of pending flushes is 0 (or at most a very small number).

Flushes pending

The flush process flushes memtables to disk as SSTables. This metric shows the number of memtables queued for the flush process. The optimal number of pending flushes is 0 (or at most a very small number).

Gossip tasks pending

Cassandra uses a protocol called gossip to discover location and state information about the other nodes participating in a Cassandra cluster. In Cassandra, the gossip process runs once per second on each node and exchanges state messages with up to three other nodes in the cluster. Gossip tasks pending shows the number of gossip messages and acknowledgments queued and waiting to be sent or received. The optimal number of pending gossip tasks is 0 (or at most a very small number).

Hinted hand-off pending

While a node is offline, other nodes in the cluster will save hints about rows that were updated during the time the node was unavailable. When a node comes back online, its corresponding replicas will begin streaming the missed writes to the node to catch it up. The hinted hand-off pending metric tracks the number of hints that are queued and waiting to be delivered once a failed node is back online again. High numbers of pending hints are commonly seen when a node is brought back online after some down time. Viewing this metric can help you determine when the recovering node has been made consistent again.

Internal response pending

The number of pending tasks from various internal tasks such as nodes joining and leaving the cluster.

Manual repair tasks pending

The number of operations still to be completed when you run anti-entropy repair on a node. It will only show values greater than 0 when a repair is in progress. It is not unusual to see a large number of pending tasks when a repair is running, but you should see the number of tasks progressively decreasing.

Memtable postflushers pending

The memtable post flush process performs the final step in the overall process of flushing memtables to disk as SSTables. The optimal number of pending flushes is 0 (or at most a very small number).

Migrations pending

The number of pending tasks from system methods that have modified the schema. Schema updates have to be propagated to all nodes, so pending tasks for this metric can manifest in schema disagreement errors.

Miscellaneous tasks pending

The number of pending tasks from other miscellaneous operations that are not ran frequently.

Read requests pending

The number of read requests that have arrived into the cluster but are waiting to be handled. During low or moderate read load, you should see 0 pending read operations (or at most a very low number).

Read repair tasks pending

The number of read repair operations that are queued and waiting for system resources in order to run. The optimal number of pending read repairs is 0 (or at most a very small number). A value greater than 0 indicates that read repair operations are in I/O contention with other operations.

Replicate on write tasks pending

When an insert or update to a row is written, the affected row is replicated to all other nodes that manage a replica for that row. This is called the ReplicateOnWriteStage. This metric tracks the pending tasks related to this stage of the write process. During low or moderate write load, you should see 0 pending replicate on write tasks (or at most a very low number).

Request response pending

Streaming of data between nodes happens during operations such as bootstrap and decommission when one node sends large numbers of rows to another node. The metric tracks the progress of the streamed rows from the receiving node.

Streams pending

Streaming of data between nodes happens during operations such as bootstrap and decommission when one node sends large numbers of rows to another node. The metric tracks the progress of the streamed rows from the sending node.

Write requests pending

The number of write requests that have arrived into the cluster but are waiting to be handled. During low or moderate write load, you should see 0 pending write operations (or at most a very low number).

Advanced table alert metrics

To access Advanced Tables metrics, choose Advanced > Tables in the Add Alert dialog.

Metric Definition

Local writes

The write load on a table measured in operations per second. This metric includes all writes to a given table, including write requests forwarded from other nodes.

Local write latency

The response time in milliseconds for successful write operations on a table. The time period starts when nodes receive a write request, and ends when nodes respond.

Local reads

The read load on a table measured in operations per second. This metric includes all reads to a given table, including read requests forwarded from other nodes.

Local read latency

The response time in microseconds for successful read operations on a table. The time period starts when a node receives a read request, and ends when the node responds.

Table key cache hits

The number of read requests that resulted in the requested row key being found in the key cache.

Table key cache requests

The total number of read requests on the row key cache.

Table key cache hit rate

The key cache hit rate indicates the effectiveness of the key cache for a given table by giving the percentage of cache requests that resulted in a cache hit.

Table row cache hits

The number of read requests that resulted in the read being satisfied from the row cache.

Table row cache requests

The total number of read requests on the row cache.

Table row cache hit rate

The key cache hit rate indicates the effectiveness of the row cache for a given table by giving the percentage of cache requests that resulted in a cache hit.

Table bloom filter space used

The size of the bloom filter files on disk.

Table bloom filter false positives

The number of false positives, which occur when the bloom filter said the row existed, but it actually did not exist in absolute numbers.

Table bloom filter false positive ratio

The fraction of all bloom filter checks resulting in a false positive.

Live disk used

The current size of live SSTables for a table. It is expected that SSTable size will grow over time with your write load as compaction processes continue doubling the size of SSTables. Monitor the current state of compaction for a given table using this metric together with SSTable count.

Total disk used

The current size of the data directories for the table including space not reclaimed by obsolete objects.

SSTable count

The current number of SSTables for a table. When table memtables are persisted to disk as SSTables, this metric increases to the configured maximum before the compaction cycle is repeated. Monitor the current state of compaction for a given table using this metric together with live disk used.

Pending reads and writes

The number of pending reads and writes on a table. Pending operations indicate Cassandra is not keeping up with the workload. A value of zero indicates healthy throughput.

Operating system performance metrics Advanced system alert metrics

General Inquiries: +1 (650) 389-6000 info@datastax.com

© DataStax | Privacy policy | Terms of use

DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Kubernetes is the registered trademark of the Linux Foundation.

landing_page landingpage