Configuring incremental repairs
Configure incremental repairs in OpsCenter. Specify the tables to include in incremental repairs. Any tables being repaired incrementally are not subjected to subrange repairs.
The Repair Service runs an incremental repair on a user-configured set of tables. For DSE 5.1 and later, OpsCenter starts an incremental repair when the incremental threshold of 1 KB (default) of unrepaired data is detected on designated tables. The Repair Service sleeps for an hour between completed incremental repair cycles. If the number of errors during an incremental repair exceeds its threshold, an alert is sent to the Event Log.
opscenterd.conf
The location of the opscenterd.conf file depends on the type of installation:- Package installations: /etc/opscenter/opscenterd.conf
- Tarball installations: install_location/conf/opscenterd.conf
cluster_name.conf
The location of the cluster_name.conf file depends on the type of installation:- Package installations: /etc/opscenter/clusters/cluster_name.conf
- Tarball installations: install_location/conf/clusters/cluster_name.conf
Prerequisites
Manually migrate tables to use incremental repair. Any incorrectly formatted table logs an error. For information on migrating to incremental repairs in DSE, see migrating to incremental repairs.
- Update the list of tables to include in incremental repairs using
the
incremental_repair_tables
configuration option.Note: TheOpsCenter.settings
andOpsCenter.backup_reports
tables are included in incremental repairs by default. - Adjust the default thresholds to trigger incremental repairs and error alerts only if necessary for your environment.
- Set the default sleep time between ending and starting a subsequent incremental repair only if necessary for your environment.
Configuration options for incremental repairs
The following options are currently configurable by adding a
[repair_service]
section to the
opscenterd.conf file to apply to all clusters, or per
cluster by adding the section to the cluster_name.conf
file. Settings in cluster_name.conf override any
settings in opscenterd.conf. After changing configuration, restart opscenterd.
- [repair_service] incremental_repair_datacenters
- Restricts incremental repairs by datacenters or racks. Setting this option improves performance by limiting the repair requests to only those replicas within the datacenters and any specified racks. Example: dc1,dc2:rack1,dc2:rack2. The default behavior sends repair requests to all datacenters and racks for all replicas.
- [repair_service] incremental_repair_tables
- The list of keyspaces and tables to include in incremental repairs. The OpsCenter.settings and OpsCenter.backup_reports tables are included by default. Example: keyspace1.standard1, keyspace1.standard2.
- [repair_service] incremental_sleep
- The number of seconds to pause after completing all incremental repairs for a cluster. Default: 3600 (1 hour).
- [repair_service] incremental_threshold
- The minimum number of bytes required to consider a table for incremental repairs. The default value of 1 byte means that if there is any unrepaired data in a table, the Repair Service will run an incremental repair. Be cautious of setting this value too high. If not enough data is written to exceed the threshold in the gc_grace_seconds period, deletes might be lost. Default: 1.
- [repair_service] incremental_err_alert_threshold
- The threshold for the number of errors during incremental repair to ignore before alerting that incremental repair seems to be failing more than an acceptable amount. Default: 20.
Procedure
- Open for editing opscenterd.conf for all clusters, or cluster_name.conf for a specific cluster.
-
Set the following incremental options for your environment requirements in the
[repair_service]
section:The following example restricts incremental repairs by datacenter (dc1) and rack (rack1), lists the tables to perform incremental repairs on, doubles the sleep between incremental repairs to 2 hours, bumps the threshold to 2 KB of unrepaired data for triggering an incremental repair for a DSE version 5.1 cluster, and doubles the default error threshold to 40 errors before sending an alert:[repair_service] incremental_repair_datacenters=dc1:rack1 incremental_repair_tables=OpsCenter.settings,OpsCenter.backup_reports,keyspace1.standard1,keyspace2.standard2 incremental_sleep=7200 incremental_threshold=2 incremental_err_alert_threshold=40
CAUTION: Exercise caution when setting theincremental_threshold
option. Setting the threshold too high might result in lost deletes during repairs. If deletes are not properly replicated, deleted data could be resurrected (also referred to as zombie data). - Restart opscenterd.