• Glossary
  • Support
  • Downloads
  • DataStax Home
Get Live Help
Expand All
Collapse All

DataStax Enterprise OpsCenter 6.8

    • About OpsCenter
      • New features
      • Key features
      • Labs features
        • Exporting and importing dashboard presets
        • Adding a Swift CLI backup location
        • Configuring named route linking
        • Viewing logs from node details
      • Architecture overview
      • OpsCenter policy for DDAC and OSS
      • Feedback about OpsCenter
    • Release notes
    • Installing OpsCenter
    • Upgrading OpsCenter
    • OpsCenter recommended settings
      • OpsCenter basic configurations
      • Cluster synchronization settings
      • Backup Service settings
      • Knowledge Base articles
    • Configuring OpsCenter
      • OpsCenter Security
        • OpsCenter SSL overview
          • Enabling/Disabling HTTPS for the OpsCenter server
          • Configuring SSL/TLS between OpsCenter and the DataStax Agents
          • Connect to DSE with client-to-node encryption in OpsCenter and the DataStax Agents
          • Editing/OpsCenter cluster connections for authentication or encryption
          • SSL configuration options for OpsCenter
        • Configuring OpsCenter role-based security
        • Encrypting sensitive configuration values
          • Activating configuration encryption
          • Creating a system key to encrypt sensitive configuration values
          • Manually encrypting a configuration value
          • Deactivating configuration encryption
        • Authenticating with LDAP
          • Configuring LDAP
          • Adding a role for an LDAP user
          • Troubleshooting OpsCenter LDAP
        • Kerberos authentication
          • Configuring OpsCenter for Kerberos authentication
          • OpsCenter Kerberos configuration options
          • Troubleshooting Kerberos in OpsCenter
        • Configuring security logging
      • Configuring alerts for events
        • SNMP alerts overview
          • Enabling SNMP alerts
        • Enabling SMTP email alerts
        • Enabling alerts posted to a URL
          • Verifying that events are posting correctly
          • Posting URL alerts to a Slack channel
      • Configuring data collection and expiration
        • Controlling data collection
        • Storing collection data on a separate cluster
      • OpsCenter DSE definitions files updates
        • Updating and configuring definitions files properties
      • Automatic failover overview
        • Enabling automatic failover
        • Failover configuration options reference
      • Backing up critical configuration data
      • Configuring named route linking
      • Configuring the OpsCenter JVM
      • Configuring the DataStax Agent JVM
        • Setting and securing the tmp directory for the DataStax Agent
        • Encrypting JMX communications
      • Changing the replication strategy for the OpsCenter keyspace
      • Configuration files for OpsCenter
        • OpsCenter configuration properties
          • Statistics reporter properties
        • Cluster configuration properties
          • Cassandra connection properties
          • Metrics Collection Properties
        • DataStax Agent configuration
        • OpsCenter logback.xml configuration
      • Customize scripts for starting and stopping DataStax Enterprise
      • Example configuration scenarios
        • Configuring for multiple regions
        • Configuring for very large clusters
    • Using OpsCenter
      • OpsCenter workspace overview
        • Ring View
        • List View
        • DataStax Agents Status View
        • Nodes Detail View
          • Node management operations
      • Configuring alerts
        • Adding an alert for agent issues
          • Troubleshooting DataStax Agent Issues
        • Adding an alert for down nodes
        • Configuring an alert for KMIP errors
        • Configuring an alert for percentage of in-memory usage
        • Configuring an alert for percentiles
      • Monitoring node operations
        • Viewing the Spark Console
        • Monitoring in-memory usage
        • Viewing logs from node details
      • Managing and maintaining nodes
        • Running cleanup
        • Performing garbage collection
        • Running compaction
        • Flushing tables
        • Decommission a node
        • Draining a node
        • Moving a node
        • Running a manual repair
        • Configure an alias for a node
      • Starting and stopping DSE
        • Starting DSE on a node
        • Stopping DSE on a node
        • Restarting DSE on a node
      • Managing keyspaces and tables
        • Keyspaces
          • Managing a keyspace
          • Managing tables
        • Browsing data deprecated
      • Cluster administration
        • Adding an existing cluster
        • Disconnecting a cluster from OpsCenter and Lifecycle Manager
        • Rebalancing a cluster overview
          • Rebalancing a cluster
          • Configuring an alert for rebalancing a cluster
        • Restarting a cluster
        • Changing the display name of a cluster
        • Downloading diagnostic data
          • Diagnostic tarball reference
          • Creating an alternate directory for diagnostic information
        • Downloading Insights diagnostic data
        • Generating a cluster report
      • OpsCenter Metrics Tooltips Reference
        • Dashboard performance metrics
        • Performance metrics overview
          • Working with metrics performance graphs
          • Organizing performance metrics presets
          • Exporting and importing dashboard presets
        • Cluster performance metrics
        • Pending task metrics
          • Pending task metrics for writes
          • Pending task metrics for reads
          • Pending task metrics for cluster operations
        • Table performance metrics
        • Tiered storage performance metrics
          • Configuring tiered storage metric graphs
          • Configuring tiered storage alerts
        • Message latency metrics
          • Adding dashboard graphs for datacenter and node messaging latency
          • Adding alerts for DC and node message latency
        • Search performance metrics
        • Graph metrics
        • NodeSync metrics
        • Thread Pool (TP) metrics
          • Viewing TP stats in Node Details
          • Enabling network backpressure
        • Dropped Messages metrics
        • Operating system performance metrics
        • Alert metrics
          • Advanced system alert metrics
    • OpsCenter 6.8 Reference
      • OpsCenter ports reference
      • Installation and configuration locations
        • Default file locations for package installations
        • Default file locations tarball installations
      • Starting, stopping, and restarting OpsCenter
        • Startup log for OpsCenter
      • Stopping, starting, and restarting DataStax Agents
    • DSE Management Services
      • Backup Service
        • Quick Video Tour: Backup Service
        • Adding a backup location
          • Adding a local file system backup location
          • Adding an Amazon S3 backup location
          • Adding an Azure backup location
        • Backing up data
          • Backing up a cluster
          • Backing up to Amazon S3
          • Backing up and restoring DataStax Graphs in OpsCenter
          • Viewing backup and restore history
          • Synchronizing backup data after an upgrade
          • Deleting backup data
        • Restoring a cluster
          • Restoring from a backup
          • Restoring a backup to a specific point-in-time
          • Monitoring sufficient disk space for restoring backups
        • Cloning cluster data
          • Cloning cluster data from a defined other location
          • Cloning cluster data from clusters managed by the same OpsCenter instance
        • Configuring the Backup Service
          • Configuring commit log backups
          • Configuring the free disk space threshold for backups
          • Configuring encryption key storage for backups
          • Configuring custom scripts to run before and after backups
          • Configuring restore to continue after a download failure
          • Backup Service configuration options
        • Troubleshooting Backup Service errors
      • NodeSync Service
        • Enabling NodeSync
        • Configuring the NodeSync refresh data interval
        • Viewing NodeSync Status
        • Configuring the NodeSync rate using LCM
        • NodeSync metrics
      • Repair Service
        • Repair Service overview
          • Subrange repairs overview
          • Distributed subrange overview
          • Incremental repairs overview
          • Repair Service behavior during environment changes
          • Estimating remaining repair time
        • Turning the Repair Service on
        • Turning the Repair Service off
        • Viewing repair status
        • Basic repair configuration
          • Configuring incremental repairs
          • Excluding keyspaces or tables from subrange repairs
          • Enabling distributed subrange repairs
          • Logging for the Repair Service
          • Basic Repair Service configuration reference
        • Advanced repair configuration
          • Adjusting or disabling the throttle for subrange repairs
          • Running validation compaction sequentially
          • Advanced Repair Service configuration reference
        • Expert repair configuration
          • Setting the maximum for parallel subrange repairs
          • Expert Repair Service configuration reference
          • Tuning Repair Service for multi-datacenter environments
        • Expedited Repair Service configuration
        • Troubleshoot Repair Service errors
        • Learn more about repairs
      • Capacity Service
        • Forecasting trends for metric graphs
        • Advanced forecast configuration
      • Best Practice Service
        • Configuring Best Practice service rules
        • Monitoring the results of Best Practice service scans
        • Best Practice Rules Reference
      • Performance Service
        • Performance Service Overview
        • Why use the OpsCenter Performance Service?
        • Enabling the OpsCenter Performance Service
        • Disabling the OpsCenter Performance Service
        • Setting permissions for the OpsCenter Performance Service
        • Tuning a database cluster with the Performance Service
          • Identifying and tuning slow queries
    • Identifying poorly performing tables
    • Monitoring node thread pool statistics
    • Troubleshooting OpsCenter
    • Lifecycle Manager
      • Overview of Lifecycle Manager
        • Supported capabilities
        • Defining the topology
        • Using configuration profiles
        • Defining repositories
        • Running jobs in LCM
          • Job types in LCM
          • Job concurrency in LCM
        • Monitoring job status
      • Installing DSE using LCM
        • Accessing OpsCenter Lifecycle Manager
        • Creating custom data directories
        • Adding SSH credentials
        • Adding a configuration profile
        • Adding a repository
        • Defining the cluster topology
          • Adding a cluster
          • Adding a datacenter
          • Adding a node
        • Running an installation job
        • Viewing job details
        • Using LCM in an offline environment
          • Required software for offline DSE installs
          • Downloading DSE in an offline environments
      • Managing SSH credentials
        • Adding SSH credentials
        • Editing SSH credentials
        • Deleting SSH credentials
        • Configuring SSH connection thresholds for LCM jobs
      • Managing configuration profiles
        • Adding a configuration profile
        • Editing a configuration profile
        • Customizing configuration profile files
        • Cloning a configuration profile
        • Deleting a configuration profile
        • Configuring an HTTP or HTTPS proxy
      • Configuring repositories
        • Adding a repository
        • Editing a repository
        • Deleting a repository
      • Defining DSE topologies
        • Managing cluster topologies
          • Adding a cluster
          • Editing a cluster
          • Deleting a cluster
          • Importing a cluster topology
        • Managing datacenter topologies
          • Adding a datacenter
          • Editing a datacenter
          • Deleting a datacenter
        • Managing node topologies
          • Adding a node
          • Editing a node
          • Deleting a node
      • Running LCM jobs
        • Running an installation job
        • Running an configure job
        • Running an upgrade job
          • Example: Upgrading DSE to a minor release using LCM
        • Aborting a job
        • Adjusting idle timeout
      • Configuring Java options
        • Choosing a Java vendor in LCM
        • Managing Java installs
        • Configuring JVM options for DSE using LCM
      • Configuring DSE security using LCM
        • Native transport authentication schemes and limitations in LCM
          • Configuring row-level access control
        • Configuring SSL/TLS for DSE
        • Configuring a JMX Connection to DSE
      • Lifecycle Manager configuration options
      • Configuration known issues and limitations
      • Using advanced configurations with LCM
        • Exporting metrics collection
        • Configuring AlwaysOn SQL
        • Configuring DSE Graph
        • Configuring the NodeSync rate
        • Configuring tiered storage
    • OpsCenter API reference for developers
      • Enable and access the Datastax Agent API
      • Authentication
      • OpsCenter configuration
      • Retrieving cluster and node information
      • Performing Cluster Operations
      • Managing Keyspaces and Tables
      • Retrieving Metric Data
      • Managing Events and Alerts
      • Schedule management
      • Backup Management and Restoring from Backups
      • Best Practice Rules
      • Hadoop
      • Spark
      • Managing Performance Service Configuration
      • User Interface
      • Agent Install and Status
      • Cluster Lifecycle Management
      • DataStax Agent API example curl commands
  • DataStax Enterprise OpsCenter 6.8
  • DSE Management Services
  • Repair Service
  • Viewing repair status

Viewing repair status

Access the Status Tab for the Repair Service

To access the Repair Status details:

  • Turn the Repair Service on. It immediately activates the Repair Service and opens the Status tab.

  • If you are elsewhere in the monitoring application and the Repair Service is already activated, click Details for the Repair Service. The Repair Status tab displays full details for repair processes.

Information to fully understand all aspects of repair status are readily available from within the Repair Status tab. Hover over areas of the Status page to view inline information. Click tooltip icons to access short descriptions about an item. Click the Read more links to access the relevant Repair Service documentation.

Monitor repair status

Monitor the progress of incremental and subrange repairs in the Status tab. After turning on the Repair Service, the Repair Service Status is either Active or Paused:

  • When the Repair Service is actively processing repairs, the Repair Service Status indicates Active. The progress graphics and statistics reflect real-time measurements of repairs.

  • The Repair Service Status appears Paused in response to cluster or schema change events.

The repair process performs validation compaction and streams data to and from other nodes in the cluster when synchronizing replicas. Those activities when active are visible in their respective panes.

Repair Service Status tab

Status pane

Indicates whether the Repair Service Status is Active or Paused.

View repair progress and statistics

The progress and statistics pane displays progress bars for subrange and incremental repairs. A pie chart represents Completed, In Progress, and Failed repair tasks thus far. Remaining tasks are not represented in the pie chart; they are represented in the progress bars. The remaining time until the incremental and subrange repairs are completed is indicated underneath each respective progress bar.

Repair Service Status dashboard Progress and Statistics pane

The Total Repairs value represents the number and percentage of the grand total of repair tasks for the current repair cycle. Repair tasks for each category’s count can represent an aggregate of the tasks shown in the Table Repair Tasks pane. Repair tasks in a particular category might not equal the total number of tasks displayed in the Table Repair Tasks pane because multiple tables might be aggregated into a single repair task. The number of tasks in the Table Repair Tasks pane are displayed and counted in all rows for tables within the range of a repair task.

View validation compactions

The Validation Compactions pane displays the progress of any validation compactions per node for both incremental and subrange repairs. In the absence of compaction activity, the No active validation compactions status is displayed.

If repairs are configured for Running validation compaction sequentially, compaction progress is considerably slower, impacting both subrange and incremental repairs.

A validation compaction reads and generates a hash for every row in the stored tables, adds the result to a Merkle tree, and returns the tree to the initiating node as part of the underlying Merkle tree comparison process.

Repair Service Compactions Status

View streaming activity

The Streams pane displays an aggregate of streaming activity progress per node. The streams could be comprised of hundreds of files. When actively streaming data, the nodes from which the streams originate and their target node are shown along with progress bars for each node receiving streamed replica data. Otherwise, the No active streams status is displayed.

Repair Service Streams Status

View repair tasks per table

The Table Repair Tasks pane provides insight into keyspace tables that are being repaired (or not if excluded), status summary, attempts at repair for skipped tasks, the type of repair, average repair time. To discover more:

  • View keyspaces and tables excluded from repairs, grouped by the exclusion criteria.

  • View details of repair tasks at the individual table level. Click a row to view repair task details isolated per keyspace table in the Repair Tasks for keyspace.table dialog.

Each column is sortable. Click a column heading to sort its column contents. The Status column provides visual status indicators along with a summary of completed, running, or pending repair tasks. Any task with errors displays a red explanation point.

Table Repair Tasks pane showing details for subrange and incremental repairs

The Total Attempts column indicates how many attempts (retries) the Repair Service has made before temporarily skipping the task. The skipped task is added to the end of the queue to retry later. The default maximum is 10 attempts. When that maximum is reached, an alert is fired and the Repair Service abandons any further repair attempts for that task. In the above graphic, 0/10 indicates all repair tasks completed without the need for any retry attempts. Configure the maximum attempts with the single_task_err_threshold option.

Incremental repair tables are opted in for repair as mentioned in the incremental repairs overview. There are a few OpsCenter keyspace tables that are hard-coded for incremental repairs: OpsCenter.backup_reports and OpsCenter.settings tables. The incremental tooltip flags these as special tables and provides a link to documentation to configure additional tables or datacenters to include in incremental repairs. Any tables configured by OpsCenter admins appear in the tasks pane sans the tooltip.

Incremental repairs have their own threshold setting for alerting about failed repair tasks. The default is 20. Configure with the incremental_err_alert_threshold option.

Observe the progress of incremental repairs using the SSTable repaired metrics available in the dashboard graphs. See Tracking repaired SSTables for incremental repairs.

View keyspaces and tables excluded from repairs

Excluding keyspaces and tables from unnecessary repairs makes repair processes more focused, efficient, and faster with less workload impact on DSE clusters.

A link is available above the Table Repair Tasks pane for viewing keyspaces and tables excluded from subrange repairs. Click the View excluded tables link. The Excluded Keyspaces and Tables dialog displays the keyspaces excluded due to RF=1, system keyspaces, reserved tables, or those specifically configured for subrange repairs to ignore.

If using authentication, be sure to change the replication strategy and replication factor for the dse_security and system_auth keyspaces so that those keyspaces are included in repairs. See Managing keyspaces and tables.

View-only dialog displaying all keyspaces and tables excluded from subrange repairs

View details for repair tasks

Click any row of the Table Repair Tasks pane to view more details about a particular task.

The Repair Tasks for keyspace.table dialog provides details for the number of Succeeded, Failed, Running, Pending, and Aborted repair tasks for each repair-eligible table in a keyspace. The Average Repair Time and number of Attempts (configurable) for the repair task are also shown.

Details for repair tasks by table

Turning the Repair Service off Basic repair configuration

General Inquiries: +1 (650) 389-6000 info@datastax.com

© DataStax | Privacy policy | Terms of use

DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Kubernetes is the registered trademark of the Linux Foundation.

landing_page landingpage