Incremental repairs overview
Incremental repairs
Incremental repairs only repair data that has not been previously repaired on tables reserved and configured for incremental repair.
After incremental repairs have completed for an entire cluster, the Repair Service sleeps for an appointed time. When the incremental threshold of unrepaired data is reached, it triggers an incremental repair only on designated tables that meet the criteria. Repairing an entire cluster one time is referred to as a repair cycle.
Incremental repairs run in a singular sequential manner and do not run in parallel.
The Repair Service coordinates incremental repairs with subrange repairs.
If the max_parallel_repairs
option is set to 1, subrange repairs and incremental repairs alternate running tasks one-at-a-time, waiting for a subrange repair to complete before starting an incremental repair and vice versa.
Doing so can be helpful for isolating repair issues.
Restricting by datacenter and racks
Specify the datacenters and racks by which to restrict incremental repairs using the incremental_repair_datacenters
option.
Restricting repairs by datacenter or racks improves repair performance in a multi-DC cluster with replicated keyspaces in both datacenters.
Repairs complete faster with fewer repair tasks to process.
Including tables
Specify the specific tables to include for incremental repairs in the incremental_repair_tables
option.
The OpsCenter.settings
and OpsCenter.backup_reports
tables are included by default.
Threshold of unrepaired data
The Repair Service only repairs a table designated as candidate for incremental repair if the amount of unrepaired data is above a certain threshold, which is 1 KB by default.
Configure the threshold with the incremental_threshold
option.
The Repair Service takes an extra step of excluding any tables from the incremental_repair_tables
option that do not meet the threshold criteria.
When an incremental repair ends, the Repair Service checks every table in the incremental repair tables list against the threshold before starting the next repair on tables that qualify for repair.
The threshold option allows for more selective incremental repairs.
Ignore incremental errors threshold
The threshold for ignoring errors before alerting is set to a default of 20.
Configure the threshold with the incremental_err_alert_threshold
to adjust the tolerated level of incremental repair error alerts for your environment.
Sleep between incremental repair cycles
After completing all incremental repairs, the Repair Service suspends incremental repairs for a fixed interval (one hour by default) until starting again.
The sleep time can be configured with the incremental_sleep
option.
Incremental repair progress
Observe the progress of incremental repairs using the SSTable repaired metrics
available in the dashboard graphs.
See Tracking repaired SSTables for incremental repairs.
The Repair Status tab displays a progress bar when an incremental repair is running.
For more information and configuration examples, see Configuring incremental repairs.