Troubleshoot Repair Service errors 

Errors encountered when running the Repair Service. Adjust repair service configuration options to resolve the errors.

To resolve errors, try adjusting the configuration options in the [repair_service] section of opscenterd.conf or cluster_name.conf as appropriate for your environment. Errors encountered when running the Repair Service can include:

Error of a single repair task 
When a single repair task fails, the repair is skipped temporarily and added to the end of the queue of repairs and retried later. If a single repair fails ten times (default), the Repair Service fires an alert. Adjust this setting with the single_task_err_threshold option. The Repair Attempts display in the Table Repair Tasks pane of the Repair Service Status page.
Timeouts 
The Repair Service times out a single repair task after one hour by default. This counts towards an error for that repair task and it is placed at the end of the queue of repairs and retried later. Adjust this setting with the single_repair_timeout option.
Too many repairs in parallel 
The Repair Service errors if it has to run too many repairs in parallel. By default, this error happens if it estimates that it needs to run more than one repair in a single replica set to complete on time. Try increasing the Time to completion parameter. If that does not resolve the issue, try adjusting the max_parallel_repairs option. See Setting the maximum for parallel subrange repairs.
CAUTION:
DataStax recommends only manually adjusting the max_parallel_repairs, changing min_repair_time and other advanced or expert options only if the time_to_completion_percentage throttle is not is use. See Adjusting or disabling the throttle for subrange repairs.
Skipping range because pending repairs exceeds the max repairs 
The Repair Service skips repairing a range if pending repairs exceed the maximum pending repairs, which is 5 by default. The Repair Service immediately moves the skipped repair task to the end of the repair queue and fires an alert. At your discretion, you might want to restart any stalled nodes. Adjust this setting with the max_pending_repairs option.
Incremental error alert threshold exceeded 
By default, the number of failed incremental repair attempts defaults to 20 before sending an alert that there could be a problem with incremental repair. Adjust this setting with the incremental_err_alert_threshold option.

opscenterd.conf 

The location of the opscenterd.conf file depends on the type of installation:

  • Package installations: /etc/opscenter/opscenterd.conf
  • Tarball installations: install_location/conf/opscenterd.conf

cluster_name.conf 

The location of the cluster_name.conf file depends on the type of installation:

  • Package installations: /etc/opscenter/clusters/cluster_name.conf
  • Tarball installations: install_location/conf/clusters/cluster_name.conf