Incremental repairs overview

Summary of incremental repair behavior and available configuration options.

Incremental repairs

Incremental repairs only repair data that has not been previously repaired on tables reserved and configured for incremental repair.

After incremental repairs have completed for an entire cluster, the Repair Service sleeps for an appointed time. When the incremental threshold of unrepaired data is reached (DSE 5.1 and later), it triggers an incremental repair only on designated tables that meet the criteria. Repairing an entire cluster one time is referred to as a repair cycle.

Incremental repairs run in a singular sequential manner and do not run in parallel. The Repair Service coordinates incremental repairs with subrange repairs. If the max_parallel_repairs option is set to 1, subrange repairs and incremental repairs alternate running tasks one-at-a-time, waiting for a subrange repair to complete before starting an incremental repair and vice versa. Doing so can be helpful for isolating repair issues.

Restricting by datacenter and racks

Specify the datacenters and racks by which to restrict incremental repairs using the incremental_repair_datacenters option. Restricting repairs by datacenter or racks improves repair performance in a multi-DC cluster with replicated keyspaces in both datacenters. Repairs complete faster with fewer repair tasks to process.

Including tables

Specify the specific tables to include for incremental repairs in the incremental_repair_tables option. The OpsCenter.settings and OpsCenter.backup_reports tables are included by default.

Threshold of unrepaired data

The Repair Service only repairs a table designated as candidate for incremental repair if the amount of unrepaired data is above a certain threshold, which is 1 KB by default. Configure the threshold with the incremental_threshold option.

Note: The incremental repair threshold option is only applicable for DSE versions 5.1 and later.

When the DSE version is at least 5.1 or later, the Repair Service takes an extra step of excluding any tables from the incremental_repair_tables option that do not meet the threshold criteria. When an incremental repair ends, the Repair Service checks every table in the incremental repair tables list against the threshold before starting the next repair on tables that qualify for repair. The threshold option allows for more selective incremental repairs.

If the DSE version is earlier than 5.1, the Repair Service always repairs every table configured for incremental repairs.

Ignore incremental errors threshold

The threshold for ignoring errors before alerting is set to a default of 20. Configure the threshold with the incremental_err_alert_threshold to adjust the tolerated level of incremental repair error alerts for your environment.

Sleep between incremental repair cycles

After completing all incremental repairs, the Repair Service suspends incremental repairs for a fixed interval (one hour by default) until starting again. The sleep time can be configured with the incremental_sleep option.

Incremental repair progress

Observe the progress of incremental repairs using the SSTable repaired metrics available in the dashboard graphs. See Tracking repaired SSTables for incremental repairs.

The Repair Status tab displays a progress bar when an incremental repair is running.

For more information and configuration examples, see Configuring incremental repairs.