Migrating to incremental repairs

To start using incremental repairs, migrate the SSTables on each node.

Repairing SSTables using anti-entropy repair is a necessary part of Cassandra maintenance. A full repair of all SSTables on a node takes a lot of time and is resource-intensive. You can manage repairs with less service disruption usingIncremental repairs. Incremental repair consumes less time and resources because it skips SSTables that are already marked as repaired.

Incremental repair works equally well with any compaction scheme — Size-Tiered Compaction (STCS), Date-Tiered Compaction(DTCS) or Leveled Compaction (LCS).

However, Cassandra's default is full repair: a new SSTable is created without metadata that identifies its repaired state. Before you can start using incremental repairs, you must add this marker to each SSTable on each node in the cluster. Follow these instructions to migrate the cluster to incremental repair gradually, one node at a time.

Overview of the procedure

To migrate one Cassandra node to incremental repair:
  1. Disable autocompaction on the node.
  2. Run the default full, sequential repair.
  3. Stop the node.
  4. Mark as repaired all the SSTables that existed before you disabled compaction.
  5. Restart Cassandra on the node.
  6. Re-enable autocompaction on the node.

Prerequisites

Listing SSTables

Before you run a full repair on the node, list its SSTables. The existing SSTables may not be changed by the repair process, and the incremental repair process you run later will not process these SSTables unless you mark each one as repaired (see Step 4 below).

You can find the node's SSTables in one of the following locations:

  • Package installations: /var/lib/cassandra
  • Tarball installations: install_location/data/data
This directory contains a subdirectory for each keyspace. Each of these subdirectories contains a set of files for each SSTable. The name of the file that contains the SSTable data has the following format:
<version_code>-<generation>-<format>-Data.db
Note: You can mark multiple SSTables as a batch by running sstablerepairedset with a text file of filenames — see Step 4.

Migrating the node to incremental repair

Note: In RHEL and Debian installations, you must install the tools packages before you can follow these steps.
  1. Disable autocompaction on the node
    From the install_directory:
    $ bin/nodetool disableautocompaction

    Running this command without parameters disables autocompaction for all keyspaces. For details, see nodetool disableautocompaction.

  2. Run the default full, sequential repair
    From the install_directory:
    $ bin/nodetool repair

    Running this command without parameters starts a full sequential repair of all SSTables on the node. This may take a substantial amount of time. For details, see nodetool repair.

  3. Stop the node.
  4. Mark as repaired all the SSTables that were created before you disabled compaction.
    Use sstablerepairedset. To mark a single SSTable SSTable-example-Data.db:
    sudo bin/sstablerepairedset --is-repaired SSTable-example-Data.db
    To do this as a batch process using a text file of SSTable names:
    sudo bin/sstablerepairedset --is-repaired -f SSTable-names.txt
    Note: The value of the repaired metadata is the timestamp of the last repair. The sstablerepairedset command applies the current date/time. To check the value of the repaired metadata for an SSTable, use:
    $ bin/sstablemetadata example-keyspace-SSTable-example-Data.db | grep "Repaired at"
  5. Restart the node.
  6. Re-enable autocompaction on the node.
    From the install_directory:
    $ bin/nodetool enableautocompaction

    Running this command without parameters enables autocompaction for all keyspaces and tables. For details, see nodetool enableautocompaction.

What's next

After you have migrated all nodes, you will be able to run incremental repairs using nodetool repair with the -inc parameter. For details, see https://www.datastax.com/blog/2014/02/more-efficient-repairs-21.