sstablescrub

Scrubs the SSTable for the provided table.

The sstablescrub utility is an offline version of nodetool scrub. It attempts to remove the corrupted parts while preserving non-corrupted data. Because sstablescrub runs offline, it can correct errors that nodetool scrub cannot. If an SSTable cannot be read due to corruption, it will be left on disk.

If scrubbing results in dropping rows, then new SSTables become unrepaired. However, if no bad rows are detected, then the SSTable keeps its original repairedAt field, which denotes the time of the repair.

Stop DSE before you run this command.

The default location of this SSTable tool depends on the type of installation:

  • Package installations: /usr/bin/

  • Tarball installations: INSTALL_DIRECTORY/resources/cassandra/tools/bin

Synopsis

sstablescrub
[--debug] [-e <arg>] [-h] [-j <arg>] [-m] [-n] [-r] [-s] [-t <number of days>] [-v]
<keyspace_name> <table_name> [-sstable-files <arg>]
Syntax legend
Syntax conventions Description

Italic, bold, or < >

Syntax diagrams and code samples use one or more of these styles to mark placeholders for variable values. Replace placeholders with a valid option or your own user-defined value.

In CQL statements, angle brackets are required to enclose data types in a set, list, map, or tuple. Separate the data types with a comma. For example: <datatype2

In Search CQL statements, angle brackets are used to identify the entity and literal value to overwrite the XML element in the schema and solrconfig files, such as @<xml_entity>='<xml_entity_type>'.

[ ]

Square brackets surround optional command arguments. Do not type the square brackets.

( )

Parentheses identify a group to choose from. Do not type the parentheses.

|

A pipe separates alternative elements. Type any one of the elements. Do not type the pipe.

...

Indicates that you can repeat the syntax element as often as required.

'

Single quotation marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case. + For Search CQL only: Single quotation marks surround an entire XML schema declaration, such as '<<schema> ... </schema>>'

{ }

Map collection. Curly braces enclose maps ({ <key_datatype>:<value_datatype> }) or key value pairs ({ <key>:<value> }). A colon separates the key and the value.

;

Ends a CQL statement.

--

Separate command line options from command arguments with two hyphens. This syntax is useful when arguments might be mistaken for command line options.

Options

If an option has a short and long form, both forms are given, separated by a comma.

--debug

Display stack traces.

-e, --header-fix argument

Check SSTable serialization-headers and repair issues:

  • validate-only: Validate serialization-headers only. Do not attempt any repairs and do not continue with the scrub once the validation is complete.

  • validate (default): Validate serialization-headers and continue with the scrub once the validation is complete.

  • fix-only: Validate and repair only the serialization-headers. Do not continue with the scrub once serialization-header validation and repairs are complete.

  • fix: Validate and repair serialization-headers and perform a normal scrub. Do not repair and do not continue with the scrub if serialization-header validation encounters errors.

  • off: Do not perform serialization-header validation checks.

-h, --help

Display the usage and listing of the commands.

-j, --jobs

Number of sstables to scrub simultaneously. Defaults to the minimum between either the number of available processors and 8.

keyspace_name

Keyspace name. Required.

-m, --manifest-check

Check and repair only the leveled manifest. Do not scrub the SSTables.

-n, --no-validate

Do not validate columns using column validator.

-r, --reinsert-overflowed-ttl

Rewrite rows with overflowed expiration date affected by CASSANDRA-14092 with the maximum supported expiration date of 2038-01-19T03:14:06+00:00. Rows are rewritten with the original timestamp incremented by one millisecond to override/supersede any potential tombstone that might have been generated during compaction of the affected rows. See Recovering expired data caused by TTL year 2038 problem.

-s, --skip-corrupted

Skips corrupt rows in counter tables.

--sstable-files

Instead of processing all SSTables in the default data directories, process only the tables specified via this option. If a single SSTable file, only that SSTable is processed. If a directory is specified, all SSTables within that directory are processed. Snapshots and backups are not supported with this option.

table_name

Table name. Required.

-t

This is a destructive operation and should only be used under the guidance of DataStax Support.

The only time to use -t is when the system clock on a node is in the future, because that makes the tombstone unpurgeable.

Provide a number of days from 1 to 1000. sstablescrub examines all deletion times, and changes the value of timestamp and local-deletion-time to now if any deletion times are equal to or greater than the specified number of days in the future. All deletion times that extend into the future beyond the given number of days are reset to the current time.

Default: 0 (disabled)

For example:

sstablescrub -v -t 1 <keyspace_name> <table_name>

Recommended to use with -v for verbose logging so you can see which partition and cluster is updated.

-v, --verbose

Enable verbose console output

Examples

Verify DataStax Enterprise is not running

nodetool status
Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  10.200.177.92  265.04 KiB  1            ?       980cab6a-2e5d-44c6-b897-0733dde580ac  rack1
DN  10.200.177.94  426.21 KiB  1            ?       7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3  rack1

Scrub all SSTables for the calendar table

sstablescrub cycling calendar

Scrub only particular SSTables for the calendar table

sstablescrub cycling calendar --sstable-files /var/lib/cassandra/data/cycling/calendar-eebb/ac-1-bti-Data.db \
                                             /var/lib/cassandra/data/cycling/calendar-aacc/ac-2-bti-Data.db

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2025 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM