When to run anti-entropy repair

When should anti-entropy repair be run on nodes.

When to run anti-entropy repair is dependent on the characteristics of a Cassandra cluster. General guidelines are presented here, and should be tailored to each particular case.

When is repaired needed?

Run repair in these situations:

To routinely maintain node health.
Note: Even if deletions never occur, schedule regular repairs. Setting a column to null is a delete.
To recover a node after a failure while bringing it back into the cluster.
To update data on a node containing data that is not read frequently, and therefore does not get read repair.
To update data on a node that has been down.
To recover missing data or corrupted SSTables. A non-incremental repair is required.

Guidelines for running routine node repair include:

Run incremental repair daily, run full repairs weekly to monthly. Monthly is generally sufficient, but run more frequently if warranted.
Important: Full repair is useful for maintaining data integrity, even if deletions never occur.
Use the parallel and partitioner range options, unless precluded by the scope of the repair.
Run a full repair to eliminate anti-compaction. Anti-compaction is the process of splitting an SSTable into two SSTables, one with repaired data and one with non-repaired data. This has compaction strategy implications.
Note: Migrating to incremental repairs is recommended if you use leveled compaction.
Run repair frequently enough that every node is repaired before reaching the time specified in the gc_grace_seconds setting. Deleted data is properly handled in the cluster if this requirement is met.
Schedule routine node repair to minimize cluster disruption.
- If possible, schedule repair operation for low-usage hours.
- If possible, schedule repair operations on single nodes at a time.
Increase the time value setting of gc_grace_seconds if data is seldom deleted or overwritten. For these tables, changing the setting will:
- Minimizes impact to disk space.
- Allow longer interval between repair operations.
Mitigate heavy disk usage by configuring nodetool compaction throttling options (setcompactionthroughput and setcompactionthreshold) before running a repair.

Guidelines for running repair on a downed node:

Do not use partitioner range, -pr.