Manual repair: Anti-entropy repair
A manual repair is run using nodetool repair
.
This tool provides many options for configuring repair.
This topic provides guidance for choosing certain parameters.
On this page:
Partitioner range (-pr
)
Within a cluster, the database stores a particular range of data on multiple nodes.
If you run nodetool repair
on one node at a time, the database may repair the same range of data several times (depending on the replication factor used in the keyspace).
If you use the partitioner range option, nodetool repair -pr
only repairs a specified range of data once, rather than repeating the repair operation.
This option decreases the strain on network resources, although nodetool repair -pr
still builds Merkle trees for each replica.
DataStax Enterprise allows you to use the partitioner range option with incremental repair;
however it is not recommended because incremental repair already avoids re-repairing data by marking data as repaired.
The most efficient way to run incremental repair is without the -pr
parameter since it can skip anti-compaction by marking whole SSTables as repaired.
If you use this option, run the repair on every node in the cluster to repair all data. Otherwise, some ranges of data will not be repaired. |
DataStax recommends using the partitioner range parameter when running full repairs during routine maintenance.
If running |
In the DSE 5.1.3 release, default repair type changed to full (from incremental). To run a full repair by partition range: |
-
On DSE 5.1.3 and later, use
nodetool repair -pr
. -
On DSE 5.1.0-5.1.2, use
nodetool repair -full -pr
.
Local (-local
) vs datacenter (-dc
) vs cluster-wide repair
Consider carefully before using nodetool repair
across datacenters, instead of within a local datacenter.
When you run repair locally on a node using -local
or --in-local-dc
, the command runs only on nodes within the same datacenter as the node that runs it.
Otherwise, the command runs cluster-wide repair processes on all nodes that contain replicas, even those in different datacenters.
For example, if you start nodetool repair
over two datacenters, DC1 and DC2, each with a replication factor of 3, repair
builds Merkle tables for 6 nodes.
The number of Merkle Tree increases linearly for additional datacenters.
Cluster-wide repair also increases network traffic between datacenters tremendously, and can cause cluster issues.
If the local option is too limited, use the -dc
or --in-dc
, which limits repairs to a specific datacenter.
This does not repair replicas on nodes in other datacenters, but it can decrease network traffic while repairing more nodes than the local options.
The nodetool repair -pr
option is good for repairs across multiple datacenters.
Additional guidance for nodetool repair
options:
-
Does not support the use of
-local
with the-pr
option unless the datacenter’s nodes have all the data for all ranges. -
Does not support the use of
-local
with-inc
(incremental repair).
For DataStax Enterprise 5.0 and later, a recommended option for repairs across datacenters: use the |
One-way targeted repair from a remote node (--pull
, --hosts
, -st
, -et
)
Runs a repair directly from another node, which has a replica in the same token range. This option minimizes performance impact when cross-datacenter repairs are required.
nodetool repair --pull -hosts target,remote keyspace_name
Endpoint range vs Subrange repair (-st
, -et
)
A repair operation runs on all partition ranges on a node, or endpoint range, unless using -st
and -et
(or -start-token
and -end-token
) options to run subrange repairs.
When you specify a start token and end token, nodetool repair
works between these tokens, repairing only those partition ranges.
Subrange repair is not a good strategy because it requires generated token ranges. However, if you know which partition has an error, you can target that partition range precisely for repair. This approach can relieve the problem known as overstreaming, which ties up resources by sending repairs to a range over and over.
Subrange repair involves more than just the nodetool repair
command.
A Java describe_splits
call to ask for a split containing 32k partitions can be iterated throughout the entire range incrementally or in parallel to eliminate the overstreaming behavior.
Once the tokens are generated for the split, they are passed to nodetool repair -st start_token -et end_token
.
The -local
option can be used to repair only within a local datacenter to reduce cross datacenter transfer.
Full repair vs incremental repair (-full
vs -inc
)
Full repair builds a full Merkle tree and compares it the data against the data on other nodes. For a complete explanation of full repair, see How does anti-entropy repair work?.
Incremental repair compares all SSTables on the node and makes necessary repairs. An incremental repair persists data that has already been repaired, and only builds Merkle trees for unrepaired SSTables. Incremental repair marks the rows in an SSTable as repaired or unrepaired.
Incremental repairs work like full repairs, with an initiating node requesting Merkle trees from peer nodes with the same unrepaired data, and then comparing the Merkle trees to discover mismatches.
Once the data has been reconciled and new SSTables built, the initiating node issues an anti-compaction command.
Anti-compaction is the process of segregating repaired and unrepaired ranges into separate SSTables, unless the SSTable fits entirely within the repaired range.
In the latter case, the SSTable metadata repairedAt
is updated to reflect its repaired status.
Anti-compaction is handled differently, depending on the compaction strategy assigned to the data.
-
Size-tiered compaction (STCS) splits repaired and unrepaired data into separate pools for separate compactions. A major compaction generates two SSTables, one for each pool of data.
-
Leveled compaction (LCS) performs size-tiered compaction on unrepaired data. After repair completes, Casandra moves data from the set of unrepaired SSTables to L0.
-
Date-tiered (DTCS) splits repaired and unrepaired data into separate pools for separate compactions. A major compaction generates two SSTables, one for each pool of data. DTCS compaction should not use incremental repair.
Parallel vs Sequential repair (default
, -seq
, -dc-par
)
Parallel runs repair on all nodes with the same replica data at the same time.
(Default behavior in DataStax Enterprise (DSE) 5.0 and later.) Sequential (-seq
, --sequential
) runs repair on one node after another.
(Default behavior in DSE 4.8 and earlier.) Datacenter parallel (-dcpar
, --dc-parallel
) combines sequential and parallel by simultaneously running a sequential repair in all datacenters;
a single node in each datacenter runs repair, one after another until the repair is complete.
Sequential repair takes a snapshot of each replica.
Snapshots are hardlinks to existing SSTables.
They are immutable and require almost no disk space.
The snapshots are active while the repair proceeds, then the database deletes them.
When the coordinator node finds discrepancies in the Merkle trees, the coordinator node makes required repairs from the snapshots.
For example, for a table in a keyspace with a Replication factor RF=3 and replicas A, B and C, the repair
command takes a snapshot of each replica immediately and then repairs each replica from the snapshots sequentially (using snapshot A to repair replica B, then snapshot A to repair replica C, then snapshot B to repair replica C).
Parallel repair works on nodes A, B, and C all at once. During parallel repair, the dynamic snitch processes queries for this table using a replica in the snapshot that is not undergoing repair.
Sequential repair is the default for DSE 4.8 and earlier. Parallel repair is the default for DSE 5.0 and later.