sstablescrub

Scrubs the SSTable for the provided table.

The sstablescrub utility is an offline version of nodetool scrub. It attempts to remove the corrupted parts while preserving non-corrupted data. Because sstablescrub runs offline, it can correct errors that nodetool scrub cannot. If an SSTable cannot be read due to corruption, it will be left on disk.

If scrubbing results in dropping rows, then new SSTables become unrepaired. However, if no bad rows are detected, then the SSTable keeps its original repairedAt field, which denotes the time of the repair.

Restriction: Stop DataStax Enterprise before you run this command.

The default location of this SSTable tool depends on the type of installation:

  • Package installations: /usr/bin/

  • Tarball installations: <installation_location>/resources/cassandra/tools/bin

Synopsis

sstablescrub
[--debug] [-e <arg>] [-h] [-j <arg>] [-m] [-n] [-r] [-s] [-t <number of days>] [-v]
<keyspace_name> <table_name> [-sstable-files <arg>]
Syntax conventions Description

UPPERCASE

Literal keyword.

Lowercase

Not literal.

<`Italics>`

Variable value. Replace with a valid option or user-defined value.

[ ]

Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the square brackets.

( )

Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

|

Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.

...

Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as required.

'<Literal string>'

Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.

{ <key>:<value> }

Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the key and the value.

<<datatype1>,<datatype2>>

Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.

cql_statement;

End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ]

Separate the command line options from the command arguments with two hyphens ( -- ). This syntax is useful when arguments might be mistaken for command line options.

' <<schema> ... </schema> >'

Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@<xml_entity>='<xml_entity_type>'

Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrconfig files.

Definition

The short form and long form parameters are comma-separated.

Command arguments

--debug

Display stack traces.

-e, --header-fix argument

Check SSTable serialization-headers and repair issues. Takes the following arguments:

  • validate-only

    Validate serialization-headers only. Do not attempt any repairs and do not continue with the scrub once the validation is complete.

  • validate

    Validate serialization-headers and continue with the scrub once the validation is complete. (Default)

  • fix-only

    Validate and repair only the serialization-headers. Do not continue with the scrub once serialization-header validation and repairs are complete.

  • fix

    Validate and repair serialization-headers and perform a normal scrub. Do not repair and do not continue with the scrub if serialization-header validation encounters errors.

  • off

    Do not perform serialization-header validation checks.

-h, --help

Display the usage and listing of the commands.

-j, --jobs

Number of sstables to scrub simultaneously. Defaults to the minimum between either the number of available processors and 8.

keyspace_name

Keyspace name. Required.

-m, --manifest-check

Check and repair only the leveled manifest. Do not scrub the SSTables.

-n, --no-validate

Do not validate columns using column validator.

-r, --reinsert-overflowed-ttl

Rewrite rows with overflowed expiration date affected by CASSANDRA-14092 with the maximum supported expiration date of 2038-01-19T03:14:06+00:00. Rows are rewritten with the original timestamp incremented by one millisecond to override/supersede any potential tombstone that might have been generated during compaction of the affected rows. See Recovering expired data caused by TTL year 2038 problem.

-s, --skip-corrupted

Skips corrupt rows in counter tables.

--sstable-files

Instead of processing all SSTables in the default data directories, process only the tables specified via this option. If a single SSTable file, only that SSTable is processed. If a directory is specified, all SSTables within that directory are processed. Snapshots and backups are not supported with this option.

table_name

Table name. Required.

-t number of days

Given a time in number of days from 1 to 1000, examines all deletion times and changes the value of timestamp and of local-deletion-time to now if any deletion times are at least the specified number of days in the future. All deletion times that extend into the future beyond the given number of days are reset to the current time.

Default: 0 - disables the flag

Command-line usage:

sstablescrub -v -t 1 <keyspace_name> <table_name>

The -v flag enables verbose logging so that you can see which partition and cluster is updated.

This is a destructive operation and should only be used under the guidance of DataStax Support.

The only time to use -t is when the system clock on a node is in the future, because that makes the tombstone unpurgeable.

-v,--verbose

Verbose output.

Examples

Verify DataStax Enterprise is not running

nodetool status
Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  10.200.177.92  265.04 KiB  1            ?       980cab6a-2e5d-44c6-b897-0733dde580ac  rack1
DN  10.200.177.94  426.21 KiB  1            ?       7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3  rack1

Restriction: Stop DataStax Enterprise before you run this command.

Scrub all SSTables for the calendar table

sstablescrub cycling calendar

Scrub only particular SSTables for the calendar table

sstablescrub cycling calendar --sstable-files /var/lib/cassandra/data/cycling/calendar-eebb/ac-1-bti-Data.db \
                                             /var/lib/cassandra/data/cycling/calendar-aacc/ac-2-bti-Data.db

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com