When to run anti-entropy repair
When to run anti-entropy repair is dependent on the characteristics of the cluster. General guidelines are presented here, and should be tailored to each particular case.
An understanding of how repair works is required to fully understand the information presented on this page, see Anti-entropy repair. |
When is repair needed?
Run repair in these situations:
-
Routinely to maintain node health.
Even if deletions never occur, schedule regular repairs. Setting a column to null is a delete.
-
When recovering a node after a failure while bringing it back into the cluster.
-
To update data on a node containing infrequently read data, and subsequently does not get read repair.
-
To update data on a downed node.
-
When recovering missing data or corrupted SSTables. You must run non-incremental repair.
Guidelines for running routine node repair
-
Run full repairs weekly to monthly. Monthly is generally sufficient, but run more frequently if warranted.
Full repair is useful for maintaining data integrity, even if deletions never occur.
-
Use the parallel and partitioner range options, unless precluded by the scope of the repair.
-
Migrate off incremental repairs and then run a full repair to eliminate anti-compaction. Anti-compaction is the process of splitting an SSTable into two SSTables, one with repaired data and one with non-repaired data. This has compaction strategy implications.
-
Run repair frequently enough that every node is repaired before reaching the time specified in the gc_grace_seconds setting. If this requirement is met, deleted data is properly handled in the cluster.
-
Schedule routine node repair operations to minimize cluster disruption during low-usage hours and on one node at a time:
-
Increase the time value setting of gc_grace_seconds if data is seldom deleted or overwritten. For these tables, changing the setting minimizes impact to disk space and provides a longer interval between repair operations.
-
Mitigate heavy disk usage by configuring nodetool compaction throttling options (setcompactionthroughput and setcompactionthreshold) before running a repair.
Guideline for running repair on a downed node
-
Do not use partitioner range,
-pr
. -
Do not use incremental repair,
-inc
.