When to run anti-entropy repair

When should anti-entropy repair be run on nodes.

When to run anti-entropy repair is dependent on the characteristics of the cluster. General guidelines are presented here, and should be tailored to each particular case.
Note: An understanding of how repair works is required to fully understand the information presented on this page, see Anti-entropy repair.

When is repair needed?

Run repair in these situations:

  • Routinely to maintain node health.
    Note: Even if deletions never occur, schedule regular repairs. Setting a column to null is a delete.
  • When recovering a node after a failure while bringing it back into the cluster.
  • To update data on a node containing infrequently read data, and subsequently does not get read repair.
  • To update data on a downed node.
  • When recovering missing data or corrupted SSTables. You must run non-incremental repair.

Guidelines for running routine node repair

  • Run full repairs weekly to monthly. Monthly is generally sufficient, but run more frequently if warranted.
    Important: Full repair is useful for maintaining data integrity, even if deletions never occur.
  • Use the parallel and partitioner range options, unless precluded by the scope of the repair.
  • Migrate off incremental repairs and then run a full repair to eliminate anti-compaction. Anti-compaction is the process of splitting an SSTable into two SSTables, one with repaired data and one with non-repaired data. This has compaction strategy implications.
    Note: If you are on DataStax Enterprise version 5.1.0-5.1.2, DataStax recommends upgrading to 5.1.3 or later.
  • Run repair frequently enough that every node is repaired before reaching the time specified in the gc_grace_seconds setting. If this requirement is met, deleted data is properly handled in the cluster.
  • Schedule routine node repair operations to minimize cluster disruption during low-usage hours and on one node at a time:
  • Increase the time value setting of gc_grace_seconds if data is seldom deleted or overwritten. For these tables, changing the setting minimizes impact to disk space and provides s longer interval between repair operations.
  • Mitigate heavy disk usage by configuring nodetool compaction throttling options (setcompactionthroughput and setcompactionthreshold) before running a repair.

Guideline for running repair on a downed node

  • Do not use partitioner range, -pr.
  • Do not use incremental repair, -inc.