Changing repair strategies

Change the method used for routine repairs from incremental or full repair. Repairing SSTables using anti-entropy repair is required for database maintenance. A full repair of all SSTables on a node takes a lot of time and is resource-intensive. Incremental repair consumes less time and resources because it skips SSTables that are already marked as repaired.

Migrating to full repairs

Incremental repairs split the data into repaired and unrepaired SSTables and mark the data state with metadata. Full repairs keeps the data together and uses no repair status flag. Before switching from incremental repairs to full repairs remove the status.

nodetool mark_unrepaired keyspace_name [table_name]

Migrating to incremental repairs

To start using incremental repairs, migrate the SSTables on each node. Incremental repair skips SSTables that are already marked as repaired. These steps ensure the data integrity when changing the repair strategy from full to incremental.

DataStax recommends using full repairs. Incremental repairs may cause performance issues, see CASSANDRA-9143.

Prerequisites

Before starting this procedure, be aware that the first system-wide full repair (3) can take a long time, as the database recompacts all SSTables. To make this process less disruptive, migrate the cluster to incremental repair one node at a time.

Procedure

In a terminal:

Disable autocompaction on the node:
```
nodetool disableautocompaction
```
Tarball path:`<install_directory>/bin`

Running nodetool disableautocompaction without parameters disables autocompaction for all keyspaces.
Before running a full repair (3), list the nodes SSTables located in /var/lib/cassandra/data. You will need this list to run the command to set the repairedAt flag in 5.

The data directory contains a subdirectory for each keyspace. Each subdirectory contains a set of files for each SSTable. The name of the file that contains the SSTable data has the following format:
```
<version_code>-<generation>-<format>-Data.db
```
Run the default full, sequential repair on one node at a time:
```
nodetool repair
```
Tarball path: <install_directory>/bin

Running nodetool repair without parameters runs a full sequential repair of all SSTables on the node and can take a substantial amount of time.
Stop the node.

Using the list you created in 2, set the repairedAt flag on each SSTable using sstablerepairedset to --is-repaired.

Unless you set the repairedAt to repaired for each SSTable, the existing SSTables might not be changed by the repair process and any incremental repair process that runs later will not process these SSTables.

To mark a single SSTable:

sudo sstablerepairedset --really-set --is-repaired <SSTable-example-Data.db>

For batch processing, use a text file of SSTable names:

sudo sstablerepairedset --really-set --is-repaired -f <SSTable-names.txt>

Tarball path:`<installation_location>/resources/cassandra/tools/bin`

The value of the repairedAt flag is the timestamp of the last repair. The sstablerepairedset command applies the current date/time. To check the value of the repairedAt flag, use:

sstablemetadata <example-keyspace>-<SSTable-example-Data.db> | grep "Repaired at"

Restart the node.

What’s next

After you have migrated all nodes, you can run incremental repairs using nodetool repair with the -inc option.

Related information

https://www.datastax.com/dev/blog/repair-in-cassandra

https://www.datastax.com/dev/blog/more-efficient-repairs

https://www.datastax.com/dev/blog/anticompaction-in-cassandra-2-1