When to run anti-entropy repair
When should anti-entropy repair be run on nodes.
When to run anti-entropy repair is dependent on the characteristics of a Cassandra cluster. General guidelines are presented here, and should be tailored to each particular case.
When is repaired needed?
Run repair in these situations:
- To routinely maintain node health.Note: Even if deletions never occur, schedule regular repairs. Setting a column to null is a delete.
- To recover a node after a failure while bringing it back into the cluster.
- To update data on a node containing data that is not read frequently, and therefore does not get read repair.
- To update data on a node that has been down.
- To recover missing data or corrupted SSTables. A non-incremental repair is required.
Guidelines for running routine node repair include:
- Run incremental repair daily, run full repairs weekly to monthly. Monthly is generally
sufficient, but run more frequently if warranted.Important: Full repair is useful for maintaining data integrity, even if deletions never occur.
- Use the parallel and partitioner range options, unless precluded by the scope of the repair.
- Run a full repair to eliminate anti-compaction. Anti-compaction is the process of
splitting an SSTable into two SSTables, one with repaired data and one with non-repaired
data. This has compaction strategy implications.Note: Migrating to incremental repairs is recommended if you use leveled compaction.
- Run repair frequently enough that every node is repaired before reaching the time specified in the gc_grace_seconds setting. Deleted data is properly handled in the cluster if this requirement is met.
- Schedule routine node repair to minimize cluster disruption.
- If possible, schedule repair operation for low-usage hours.
- If possible, schedule repair operations on single nodes at a time.
- Increase the time value setting of gc_grace_seconds if data is seldom
deleted or overwritten. For these tables, changing the setting will:
- Minimizes impact to disk space.
- Allow longer interval between repair operations.
- Mitigate heavy disk usage by configuring nodetool compaction throttling options (setcompactionthroughput and setcompactionthreshold) before running a repair.
Guidelines for running repair on a downed node:
- Do not use partitioner range,
-pr
.