Set the maximum for parallel subrange repairs

About this task

Set the maximum number of subrange repairs to run in parallel to tune slow running repairs, or to troubleshoot repairs.

When the max_parallel_repairs option is unspecified or set to 0 (default), the Repair Service calculates the correct number of maximum repairs to run in parallel. The basic calculation is ceiling(total # nodes in cluster / maximum total RF). The calculation prevents replica sets from overlapping during repairs.

DataStax recommends only manually adjusting the max_parallel_repairs, changing min_repair_time and other advanced or expert options only if the time_to_completion_percentage throttle is not is use. See Adjust or disable the throttle for subrange repairs.

Procedure

Locate the opscenterd.conf file. The location of this file depends on the type of installation:
- Package installations: /etc/opscenter/opscenterd.conf
- Tarball installations: install_location/conf/opscenterd.conf
Locate the cluster_name.conf file. The location of this file depends on the type of installation:
- Package installations: /etc/opscenter/clusters/cluster_name.conf
- Tarball installations: install_location/conf/clusters/cluster_name.conf
Open for editing opscenterd.conf for all clusters, or cluster_name.conf for a specific cluster.
Adjust the [repair_service] configuration as appropriate for your environment:

Setting the max_parallel_repairs to 0 (or leaving it blank) makes the Repair Service dynamically calculate the number of subrange repairs to run in parallel based on the formula previously described:
```
[repair_service]
max_parallel_repairs=0
parallel_tasks_update_interval=120
```
This is the default behavior for determining maximum parallel subrange repairs. DataStax recommends using the default dynamic setting for maximum parallel repairs in conjunction with the throttle provided by the time_to_completion_target_percentage option.

The parallel_tasks_update_interval determines the length of time before the Repair Service periodically recalculates the required number of parallel tasks to run during a subrange repair cycle. The interval is 120 seconds (2 minutes) by default. Extend the interval if the calculated number of parallel tasks appear to be flapping (excessively oscillating) due to the Repair Service detecting inadequate throughput to complete a cycle on time. Temporarily setting the repair service log to DEBUG can provide more insight as to whether an adjustment is necessary.

Setting the max_parallel_repairs to 1 forces the Repair Service to run only one repair task at a time:
```
[repair_service]
max_parallel_repairs=1
```
Subrange repairs and incremental repairs alternate running tasks one at a time. The Repair Service waits for a subrange repair to complete before starting an incremental repair and vice versa. Forcing the repairs to process tasks one at a time and alternate between incremental and subrange repairs can be helpful when trying to isolate issues during troubleshooting.

If subrange repairs are running slowly with the dynamically calculated value (default behavior with 0 or unset as shown in the first example), manually set the number of maximum parallel repairs:
```
[repair_service]
max_parallel_repairs=4
```
```
[repair_service]
max_parallel_repairs=4
```
Experiment with adjusting the values until repairs are processing as expected for your environment.
Restart opscenterd.
Monitor the repair progress on the Status tab.
Review the repair service log messages for awareness about the impact the configuration change has on your environment.

Set the maximum for parallel subrange repairs

About this task

Procedure

Was this helpful?

Give Feedback