Changing repair strategies

Change the method used for routine repairs from incremental or full repair. Repairing SSTables using anti-entropy repair is required for database maintenance. A full repair of all SSTables on a node takes a lot of time and is resource-intensive. Incremental repair consumes less time and resources because it skips SSTables that are already marked as repaired.

Migrating to full repairs

Incremental repairs split the data into repaired and unrepaired SSTables and mark the data state with metadata. Full repairs keeps the data together and uses no repair status flag.

Before switching from incremental repairs to full repairs remove the repaired status:

nodetool mark_unrepaired KEYSPACE_NAME TABLE_NAME

Migrating to incremental repairs

To start using incremental repairs, migrate the SSTables on each node. Incremental repair skips SSTables that are already marked as repaired. These steps ensure the data integrity when changing the repair strategy from full to incremental.

DataStax recommends using full repairs. Incremental repairs can cause performance issues. See CASSANDRA-9143.

However, be aware that the first system-wide full repair can take a long time while the database recompacts all SSTables. To make this process less disruptive, migrate the cluster to incremental repair one node at a time.

  1. Disable autocompaction on the node.

    Running nodetool disableautocompaction without parameters disables autocompaction for all keyspaces:

    nodetool disableautocompaction

    For tarball installations, run this command from the /bin directory of your DSE installation.

  2. Before running a full repair in the next step, list the nodes SSTables located in /var/lib/cassandra/data. You need this list to run the command to set the repairedAt flag later in this process when you mark the tables as repaired.

    The data directory contains a subdirectory for each keyspace. Each subdirectory contains a set of files for each SSTable. The name of the file that contains the SSTable data has the format VERSION_CODE-GENERATION-FORMAT-Data.db.

  3. Run the default full, sequential repair on one node at a time.

    Running nodetool repair without parameters runs a full sequential repair of all SSTables on the node and can take a substantial amount of time:

    nodetool repair

    For tarball installations, run this command from the /bin directory of your DSE installation.

  4. Stop the node.

  5. Using the list you created earlier in this process, set the repairedAt flag on each SSTable using sstablerepairedset with the --is-repaired option.

    If you don’t set repairedAt to --is-repaired for each SSTable, the existing SSTables might not be changed by the repair process, and any incremental repair processes that run later won’t process these SSTables.

    To mark a single SSTable as repaired, run the following command, replacing SSTABLE_NAME_DATA_FILE_NAME with the actual SSTable data file name (VERSION_CODE-GENERATION-FORMAT-Data.db):

    sudo sstablerepairedset --really-set --is-repaired SSTABLE_DATA_FILE_NAME

    For batch processing, use a text file of SSTable names:

    sudo sstablerepairedset --really-set --is-repaired -f SSTable-names.txt

    For tarball installations, run these commands from the /resources/cassandra/tools/bin directory of your DSE installation.

    The value of the repairedAt flag is the timestamp of the last repair. The sstablerepairedset command sets the timestamp the current date/time when you run the command. To check the value of the repairedAt flag, use sstablemetadata:

    sstablemetadata KEYSPACE_NAME-SSTABLE_DATA_FILE_NAME | grep "Repaired at"
  6. Restart the node.

  7. After you have migrated all nodes, you can run incremental repairs using nodetool repair with the -inc option.

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2025 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM