Manual repair: Anti-entropy repair
Guidance for using some nodetool repair parameters.
A manual repair is run using nodetool repair. This tool provides many options for configuring repair. This page provides guidance for choosing certain parameters.
repair
command will be rejected when NodeSync
is enabled.Partitioner range (-pr)
Within a cluster, the database stores a particular range of data on multiple nodes. If you run
nodetool repair
on one node at a time, the database may repair the same range
of data several times (depending on the replication factor used in the keyspace). If you use the
partitioner range option, nodetool repair -pr
only repairs a specified range of
data once, rather than repeating the repair operation. This option decreases the strain on
network resources, although nodetool repair -pr
still builds Merkle trees for
each replica.
nodetool repair -pr
on a downed node that
has been recovered, be sure to run the command on all other nodes in the cluster as
well.Local (-local) vs datacenter (-dc) vs cluster-wide repair
Consider carefully before using nodetool repair
across datacenters, instead
of within a local datacenter. When you run repair locally on a node using
-local
, the command runs only on nodes within the same datacenter as the
node that runs it. Otherwise, the command runs cluster-wide repair processes on all nodes
that contain replicas, even those in different datacenters. For example, if you start
nodetool repair
over two datacenters, DC1 and DC2, each with a
replication factor of 3, repair
builds Merkle tables for 6 nodes. The
number of Merkle Tree increases linearly for additional datacenters. Cluster-wide repair
also increases network traffic between datacenters tremendously, and can cause cluster
issues.
If the local option is too limited, use the -dc
option to limit repairs to a
specific datacenter. This does not repair replicas on nodes in other datacenters, but it can
decrease network traffic while repairing more nodes than the local options.
The nodetool repair -pr
option is good for repairs across multiple
datacenters.
nodetool repair
options:- Does not support the use of
-local
with the-pr
option unless the datacenter nodes have all the data for all ranges. - Does not support the use of
-local
with-inc
(incremental repair).
-dcpar
option to repair
datacenters in parallel.One-way targeted repair from a remote node (--pull, --hosts, -st, -et)
nodetool repair --pull -hosts local_ip_address,remote_ip_address keyspace_name
Endpoint range vs Subrange repair (-st, -et)
A repair operation runs on all partition ranges on a node, or endpoint range, unless using
-st
and -et
(or -start-token
and
-end-token
) options to run subrange repairs. When you specify a start token
and end token, nodetool repair
works between these tokens, repairing only those
partition ranges.
Subrange repair is not a good strategy because it requires generated token ranges. However, if you know which partition has an error, you can target that partition range precisely for repair. This approach can relieve the problem known as overstreaming, which ties up resources by sending repairs to a range over and over.
Subrange repair involves more than just the nodetool repair
command. A Java
describe_splits call to ask for a split containing 32k partitions can be
iterated throughout the entire range incrementally or in parallel to eliminate the overstreaming
behavior. Once the tokens are generated for the split, they are passed to nodetool
repair -st start_token -et end_token
. The
-local
option can be used to repair only within a local datacenter to reduce
cross datacenter transfer.
Full repair vs incremental repair (-full vs -inc)
Full repair builds a full Merkle tree and compares it the data against the data on other nodes. For a complete explanation of full repair, see .
Incremental repair compares all SSTables on the node and makes necessary repairs. An incremental repair persists data that has already been repaired, and only builds Merkle trees for unrepaired SSTables. Incremental repair marks the rows in an SSTable as repaired or unrepaired.
Incremental repairs work like full repairs, with an initiating node requesting Merkle trees
from peer nodes with the same unrepaired data, and then comparing the Merkle trees to discover
mismatches. Once the data has been reconciled and new SSTables built, the initiating node issues
an anti-compaction command. Anti-compaction is the process of segregating repaired and
unrepaired ranges into separate SSTables, unless the SSTable fits entirely within the repaired
range. In the latter case, the SSTable metadata repairedAt
is updated to
reflect its repaired status.
- Size-tiered compaction (STCS) splits repaired and unrepaired data into separate pools for separate compactions. A major compaction generates two SSTables, one for each pool of data.
- Leveled compaction (LCS) performs size-tiered compaction on unrepaired data. After repair completes, Casandra moves data from the set of unrepaired SSTables to L0.
- Date-tiered (DTCS) splits repaired and unrepaired data into separate pools for separate compactions. A major compaction generates two SSTables, one for each pool of data. DTCS compaction should not use incremental repair.
Parallel vs Sequential repair (default, -seq, -dc-par)
The default mode runs repair on all nodes with the same replica data at the same time. Sequential (-seq) runs repair on one node after another. Datacenter parallel (-dcpar) combines sequential and parallel by simultaneously running a sequential repair in all datacenters; a single node in each datacenter runs repair, one after another until the repair is complete.
Sequential repair takes a snapshot of each replica. Snapshots are hardlinks to existing SSTables. They are immutable and require almost no disk space. The snapshots are active while the repair proceeds, then the database deletes them. When the coordinator node finds discrepancies in the Merkle trees, the coordinator node makes required repairs from the snapshots. For example, for a table in a keyspace with a Replication factor RF=3 and replicas A, B and C, the repair command takes a snapshot of each replica immediately and then repairs each replica from the snapshots sequentially (using snapshot A to repair replica B, then snapshot A to repair replica C, then snapshot B to repair replica C).
Parallel repair works on nodes A, B, and C all at once. During parallel repair, the dynamic snitch processes queries for this table using a replica in the snapshot that is not undergoing repair.
Sequential repair is the default in DataStax Enterprise 4.8 and earlier. Parallel repair is the default for DataStax Enterprise 5.0 and later.