Configuring incremental repairs
The Repair Service runs an incremental repair on a user-configured set of tables. OpsCenter starts an incremental repair when the incremental threshold of 1 KB (default) of unrepaired data is detected on designated tables. The Repair Service sleeps for an hour between completed incremental repair cycles. If the number of errors during an incremental repair exceeds its threshold, an alert is sent to the Event Log.
Manually migrate tables to use incremental repair. Any incorrectly formatted table logs an error. For information on migrating to incremental repairs in DSE, see migrating to incremental repairs.
Update the list of tables to include in incremental repairs using the
OpsCenter.backup_reportstables are included in incremental repairs by default.
Adjust the default thresholds to trigger incremental repairs and error alerts only if necessary for your environment.
Set the default sleep time between ending and starting a subsequent incremental repair only if necessary for your environment.
The following options are currently configurable by adding a
[repair_service] section to the opscenterd.conf file to apply to all clusters, or per cluster by adding the section to the cluster_name.conf file.
cluster_name.conf override any settings in
The location of the
cluster_name.conf file depends on the type of installation:
Package installations: /etc/opscenter/clusters/cluster_name.conf
Tarball installations: install_location/conf/clusters/cluster_name.conf After changing configuration, restart opscenterd.
Restricts incremental repairs by datacenters or racks. Setting this option improves performance by limiting the repair requests to only those replicas within the datacenters and any specified racks. Example:
dc1,dc2:rack1,dc2:rack2. The default behavior sends repair requests to all datacenters and racks for all replicas.
The list of keyspaces and tables to include in incremental repairs.Example:
The number of seconds to pause after completing all incremental repairs for a cluster. Default:
The minimum number of bytes required to consider a table for incremental repairs. The default value of 1 byte means that if there is any unrepaired data in a table, the Repair Service will run an incremental repair. Be cautious of setting this value too high. If not enough data is written to exceed the threshold in the gc_grace_seconds period, deletes might be lost. Default:
The threshold for the number of errors during incremental repair to ignore before alerting that incremental repair seems to be failing more than an acceptable amount. Default:
Locate the opscenterd.conf file. The location of this file depends on the type of installation:
Package installations: /etc/opscenter/opscenterd.conf
Tarball installations: install_location/conf/opscenterd.conf
Open for editing opscenterd.conf for all clusters, or cluster_name.conf for a specific cluster.
Set the following incremental options for your environment requirements in the
The following example restricts incremental repairs by datacenter (dc1) and rack (rack1), lists the tables to perform incremental repairs on, doubles the sleep between incremental repairs to 2 hours, bumps the threshold to 2 KB of unrepaired data for triggering an incremental repair for the DSE cluster, and doubles the default error threshold to 40 errors before sending an alert:
[repair_service] incremental_repair_datacenters=dc1:rack1 incremental_repair_tables=OpsCenter.settings,OpsCenter.backup_reports,keyspace1.standard1,keyspace2.standard2 incremental_sleep=7200 incremental_threshold=2 incremental_err_alert_threshold=40
Exercise caution when setting the
incremental_thresholdoption. Setting the threshold too high might result in lost deletes during repairs. If deletes are not properly replicated, deleted data could be resurrected (also referred to as zombie data).