Configure incremental repairs

The Repair Service runs an incremental repair on a user-configured set of tables. OpsCenter starts an incremental repair when the incremental threshold of 1 KB (default) of unrepaired data is detected on designated tables. The Repair Service sleeps for an hour between completed incremental repair cycles. If the number of errors during an incremental repair exceeds its threshold, an alert is sent to the Event Log.

You can do the following:

Update the list of tables to include in incremental repairs using the incremental_repair_tables configuration option.

The OpsCenter.settings and OpsCenter.backup_reports tables are included in incremental repairs by default.
Adjust the default thresholds to trigger incremental repairs and error alerts only if necessary for your environment.
Set the default sleep time between ending and starting a subsequent incremental repair only if necessary for your environment.

Before setting these options, manually migrate tables to use incremental repair. Any incorrectly formatted table logs an error. For information on migrating to incremental repairs in DSE, see Migrating to incremental repairs.

Configuration options for incremental repairs

The following options are currently configurable by adding a [repair_service] section to the opscenterd.conf file to apply to all clusters, or per cluster by adding the section to the cluster_name.conf file. Settings in cluster_name.conf override any settings in opscenterd.conf. The location of the cluster_name.conf file depends on the type of installation:

Package installations: /etc/opscenter/clusters/cluster_name.conf
Tarball installations: INSTALL_DIRECTORY/conf/clusters/cluster_name.conf After changing configuration, restart opscenterd.
[repair_service] incremental_repair_datacenters

Restricts incremental repairs by datacenters or racks. Setting this option improves performance by limiting the repair requests to only those replicas within the datacenters and any specified racks. Example: dc1,dc2:rack1,dc2:rack2. The default behavior sends repair requests to all datacenters and racks for all replicas.
[repair_service] incremental_repair_tables

The list of keyspaces and tables to include in incremental repairs. Example: keyspace1.standard1, keyspace1.standard2.
[repair_service] incremental_sleep

The number of seconds to pause after completing all incremental repairs for a cluster. Default: 3600 (1 hour).
[repair_service] incremental_threshold

The minimum number of bytes required to consider a table for incremental repairs. The default value of 1 byte means that if there is any unrepaired data in a table, the Repair Service runs an incremental repair. Be cautious of setting this value too high. If not enough data is written to exceed the threshold in the gc_grace_seconds period, deletes might be lost. Default: 1.
[repair_service] incremental_err_alert_threshold

The threshold for the number of errors during incremental repair to ignore before alerting that incremental repair seems to be failing more than an acceptable amount. Default: 20.

Procedure

Locate the opscenterd.conf file. The location of this file depends on the type of installation:
- Package installations: /etc/opscenter/opscenterd.conf
- Tarball installations: INSTALL_DIRECTORY/conf/opscenterd.conf
Open for editing opscenterd.conf for all clusters, or cluster_name.conf for a specific cluster.

Set the following incremental options for your environment requirements in the [repair_service] section:

The following example restricts incremental repairs by datacenter (dc1) and rack (rack1), lists the tables to perform incremental repairs on, doubles the sleep between incremental repairs to 2 hours, increases the threshold to 2 KB of unrepaired data for triggering an incremental repair for the DSE cluster, and doubles the default error threshold to `40 ` errors before sending an alert:

[repair_service]
incremental_repair_datacenters=dc1:rack1
incremental_repair_tables=OpsCenter.settings,OpsCenter.backup_reports,keyspace1.standard1,keyspace2.standard2
incremental_sleep=7200
incremental_threshold=2
incremental_err_alert_threshold=40

Exercise caution when setting the incremental_threshold option. Setting the threshold too high might result in lost deletes during repairs. If deletes are not properly replicated, deleted data could be resurrected (also referred to as zombie data).

Additionally, be sure to monitor repair progress of SSTables during an incremental repair. See Track repaired SSTables for incremental repairs.

Restart opscenterd.

Configure incremental repairs

Configuration options for incremental repairs

Procedure

Was this helpful?

Give Feedback