Configuring incremental repairs

About this task

The Repair Service runs an incremental repair on a user-configured set of tables. OpsCenter starts an incremental repair when the incremental threshold of 1 KB (default) of unrepaired data is detected on designated tables. The Repair Service sleeps for an hour between completed incremental repair cycles. If the number of errors during an incremental repair exceeds its threshold, an alert is sent to the Event Log.

Prerequisites

Manually migrate tables to use incremental repair. Any incorrectly formatted table logs an error. For information on migrating to incremental repairs in DSE, see migrating to incremental repairs.

About this task

  • Update the list of tables to include in incremental repairs using the incremental_repair_tables configuration option.

    The OpsCenter.settings and OpsCenter.backup_reports tables are included in incremental repairs by default.

  • Adjust the default thresholds to trigger incremental repairs and error alerts only if necessary for your environment.

  • Set the default sleep time between ending and starting a subsequent incremental repair only if necessary for your environment.

Configuration options for incremental repairs

The following options are currently configurable by adding a [repair_service] section to the opscenterd.conf file to apply to all clusters, or per cluster by adding the section to the cluster_name.conf file. Settings in cluster_name.conf override any settings in opscenterd.conf. The location of the cluster_name.conf file depends on the type of installation:

  • Package installations: /etc/opscenter/clusters/cluster_name.conf

  • Tarball installations: install_location/conf/clusters/cluster_name.conf After changing configuration, restart opscenterd.

  • [repair_service] incremental_repair_datacenters

    Restricts incremental repairs by datacenters or racks. Setting this option improves performance by limiting the repair requests to only those replicas within the datacenters and any specified racks. Example: dc1,dc2:rack1,dc2:rack2. The default behavior sends repair requests to all datacenters and racks for all replicas.

  • [repair_service] incremental_repair_tables

    The list of keyspaces and tables to include in incremental repairs. Example: keyspace1.standard1, keyspace1.standard2.

  • [repair_service] incremental_sleep

    The number of seconds to pause after completing all incremental repairs for a cluster. Default: 3600 (1 hour).

  • [repair_service] incremental_threshold

    The minimum number of bytes required to consider a table for incremental repairs. The default value of 1 byte means that if there is any unrepaired data in a table, the Repair Service runs an incremental repair. Be cautious of setting this value too high. If not enough data is written to exceed the threshold in the gc_grace_seconds period, deletes might be lost. Default: 1.

  • [repair_service] incremental_err_alert_threshold

    The threshold for the number of errors during incremental repair to ignore before alerting that incremental repair seems to be failing more than an acceptable amount. Default: 20.

Procedure

  1. Locate the opscenterd.conf file. The location of this file depends on the type of installation:

    • Package installations: /etc/opscenter/opscenterd.conf

    • Tarball installations: install_location/conf/opscenterd.conf

  2. Open for editing opscenterd.conf for all clusters, or cluster_name.conf for a specific cluster.

  3. Set the following incremental options for your environment requirements in the [repair_service] section:

    The following example restricts incremental repairs by datacenter (dc1) and rack (rack1), lists the tables to perform incremental repairs on, doubles the sleep between incremental repairs to 2 hours, increases the threshold to 2 KB of unrepaired data for triggering an incremental repair for the DSE cluster, and doubles the default error threshold to `40 ` errors before sending an alert:

    [repair_service]
    incremental_repair_datacenters=dc1:rack1
    incremental_repair_tables=OpsCenter.settings,OpsCenter.backup_reports,keyspace1.standard1,keyspace2.standard2
    incremental_sleep=7200
    incremental_threshold=2
    incremental_err_alert_threshold=40

    Exercise caution when setting the incremental_threshold option. Setting the threshold too high might result in lost deletes during repairs. If deletes are not properly replicated, deleted data could be resurrected (also referred to as zombie data).

    == Monitor repair progress of SSTables during an incremental repair

  4. Restart opscenterd.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com