sstablescrub

An offline version of nodetool scrub. It attempts to remove the corrupted parts while preserving non-corrupted data.

The sstablescrub utility is an offline version of nodetool scrub. It attempts to remove the corrupted parts while preserving non-corrupted data. Because sstablescrub runs offline, it can correct errors that nodetool scrub cannot. If an SSTable cannot be read due to corruption, it will be left on disk.

If scrubbing results in dropping rows, new SSTables become unrepaired. However, if no bad rows are detected, the SSTable keeps its original repairedAt field, which denotes the time of the repair.

Procedure

  1. Before using sstablescrub, try rebuilding the tables using nodetool scrub.

    If nodetool scrub does not fix the problem, use sstablescrub.

  2. Shut down the node.
  3. Run the utility:
    sstablescrub [--debug] [-e arg] [-h] [-j arg] [-m] [-n] [-r] [-s] [-t arg] [-v] 
                 keyspace_name table_name [-sstable-files arg]
    --debug
    Display stack traces.
    -e, --header-fix argument
    Check SSTable serialization-headers and repair issues. Takes the following arguments:
    validate-only
    Validate serialization-headers only. Do not attempt any repairs and do not continue with the scrub once the validation is complete.
    validate
    Validate serialization-headers and continue with the scrub once the validation is complete. (Default)
    fix-only
    Validate and repair only the serialization-headers. Do not continue with the scrub once serialization-header validation and repairs are complete.
    fix
    Validate and repair serialization-headers and perform a normal scrub. Do not repair and do not continue with the scrub if serialization-header validation encounters errors.
    off
    Do not perform serialization-header validation checks.
    -h, --help
    Display help.
    -j, --jobs

    Number of sstables to scrub simultaneously. Defaults to the minimum between either the number of available processors and 8.

    -m, --manifest-check
    Only check and repair the leveled manifest, without actually scrubbing the SSTables.
    --reinsert-overflowed-ttl
    Rewrites SSTables containing rows with overflowed expiration time with the maximum expiration date of 2038-01-19T03:14:06+00:00 using the original timestamp + 1 (ms).
    -s, --skip-corrupted
    Skip corrupt rows in counter tables.
    --sstable-files
    Instead of processing all SSTables in the default data directories, process only the tables specified via this option. If a single SSTable file, only that SSTable is processed. If a directory is specified, all SSTables within that directory are processed. Snapshots and backups are not supported with this option.
    -t

    Given a number of days from 1 to 1000, examines all deletion times and changes the timestamp and local-deletion-time to now if any deletion times are at least the number of days in the future specified by the argument.

    Warning: This is a destructive operation and should only be used under the guidance of DataStax support.
    -v, --verbose
    Verbose output.