Read Repair: repair during read path
Describes read repair, repair during read path.
Read repair is an important component of keeping data consistent in a Cassandra cluster, because every time a read request occurs, it provides an opportunity for consistency improvement. As a background process, read repair generally puts little strain on the cluster.
When data is read to satisfy a query and return a result, all replicas are queried for the data
needed. The first replica node receives a direct read request and supplies the full
data. The other nodes contacted receive a digest request and return a digest, or hash of
the data. A digest is requested because generally the hash is smaller than the data
itself. A comparison of the digests allows the coordinator to return the most up-to-date
data to the query. If the digests are the same for enough replicas to meet the
consistency level, the data is returned. If the consistency level of the read query is
ALL
, the comparison must be completed before the results are
returned; otherwise for all lower consistency levels, it is done in the background.
The coordinator compares the digests, and if a mismatch is discovered, a request for the full data is sent to the mismatched nodes. The most current data found in a full data comparison is used to reconcile any inconsistent data on other replicas.
Read repair can be configured per table, using read_repair_chance
, and is
enabled by default.
The compaction strategy DateTieredCompactionStrategy precludes using read repair, because of the way
timestamps are checked for DTCS compaction. In this case, you must set
read_repair_chance
to zero. For other compaction strategies, read
repair should be enabled with a read_repair_chance
value of 0.2 being
typical.