Configuring incremental repairs

Configure incremental repairs in OpsCenter. Specify the tables to include in incremental repairs. Any tables being repaired incrementally are not subjected to subrange repairs.

The Repair Service runs an incremental repair on a user-configured set of tables. For DSE 5.1 and later, OpsCenter starts an incremental repair when the incremental threshold of 1 KB (default) of unrepaired data is detected on designated tables. The Repair Service sleeps for an hour between completed incremental repair cycles. If the number of errors during an incremental repair exceeds its threshold, an alert is sent to the Event Log.

opscenterd.conf

The location of the opscenterd.conf file depends on the type of installation:
  • Package installations: /etc/opscenter/opscenterd.conf
  • Tarball installations: install_location/conf/opscenterd.conf

cluster_name.conf

The location of the cluster_name.conf file depends on the type of installation:
  • Package installations: /etc/opscenter/clusters/cluster_name.conf
  • Tarball installations: install_location/conf/clusters/cluster_name.conf

Prerequisites

Manually migrate tables to use incremental repair. Any incorrectly formatted table logs an error. For information on migrating to incremental repairs in DSE, see migrating to incremental repairs.

  • Update the list of tables to include in incremental repairs using the incremental_repair_tables configuration option.
    Note: The OpsCenter.settings and OpsCenter.backup_reports tables are included in incremental repairs by default.
  • Adjust the default thresholds to trigger incremental repairs and error alerts only if necessary for your environment.
  • Set the default sleep time between ending and starting a subsequent incremental repair only if necessary for your environment.

Configuration options for incremental repairs

The following options are currently configurable by adding a [repair_service] section to the opscenterd.conf file to apply to all clusters, or per cluster by adding the section to the cluster_name.conf file. Settings in cluster_name.conf override any settings in opscenterd.conf. After changing configuration, restart opscenterd.

[repair_service] incremental_repair_datacenters
Restricts incremental repairs by datacenters or racks. Setting this option improves performance by limiting the repair requests to only those replicas within the datacenters and any specified racks. Example: dc1,dc2:rack1,dc2:rack2. The default behavior sends repair requests to all datacenters and racks for all replicas.
[repair_service] incremental_repair_tables
The list of keyspaces and tables to include in incremental repairs. The OpsCenter.settings and OpsCenter.backup_reports tables are included by default. Example: keyspace1.standard1, keyspace1.standard2.
[repair_service] incremental_sleep
The number of seconds to pause after completing all incremental repairs for a cluster. Default: 3600 (1 hour).
[repair_service] incremental_threshold
The minimum number of bytes required to consider a table for incremental repairs (DSE 5.1+ only). The default value of 1 byte means that if there is any unrepaired data in a table, the Repair Service will run an incremental repair. Be cautious of setting this value too high. If not enough data is written to exceed the threshold in the gc_grace_seconds period, deletes might be lost. Default: 1.
[repair_service] incremental_err_alert_threshold
The threshold for the number of errors during incremental repair to ignore before alerting that incremental repair seems to be failing more than an acceptable amount. Default: 20.

Procedure

  1. Open for editing opscenterd.conf for all clusters, or cluster_name.conf for a specific cluster.
  2. Set the following incremental options for your environment requirements in the [repair_service] section:
    The following example restricts incremental repairs by datacenter (dc1) and rack (rack1), lists the tables to perform incremental repairs on, doubles the sleep between incremental repairs to 2 hours, bumps the threshold to 2 KB of unrepaired data for triggering an incremental repair for a DSE version 5.1 cluster, and doubles the default error threshold to 40 errors before sending an alert:
    [repair_service]
    incremental_repair_datacenters=dc1:rack1
    incremental_repair_tables=OpsCenter.settings,OpsCenter.backup_reports,keyspace1.standard1,keyspace2.standard2
    incremental_sleep=7200
    incremental_threshold=2
    incremental_err_alert_threshold=40
    CAUTION: Exercise caution when setting the incremental_threshold option. Setting the threshold too high might result in lost deletes during repairs. If deletes are not properly replicated, deleted data could be resurrected (also referred to as zombie data).
  3. Restart opscenterd.