sstablescrub

Scrubs the SSTable for the provided table.

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:
Package installations /etc/dse/cassandra/cassandra.yaml
Tarball installations installation_location/resources/cassandra/conf/cassandra.yaml

Scrubs the SSTable for the provided table.

The sstablescrub utility is an offline version of nodetool scrub. It attempts to remove the corrupted parts while preserving non-corrupted data. Because sstablescrub runs offline, it can correct errors that nodetool scrub cannot. If an SSTable cannot be read due to corruption, it will be left on disk.

If scrubbing results in dropping rows, new SSTables become unrepaired. However, if no bad rows are detected, the SSTable keeps its original repairedAt field, which denotes the time of the repair.

Restriction: DataStax Enterprise must be stopped before you run this command.

Synopsis

sstablescrub
[--debug] [-e arg] [-h] [-j arg] [-m] [-n] [-r] [-s] [-v] 
keyspace_name table_name
Table 1. Legend
Syntax conventions Description
UPPERCASE Literal keyword.
Lowercase Not literal.
Italics Variable value. Replace with a valid option or user-defined value.
[ ] Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the square brackets.
( ) Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.
| Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.
... Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as required.
'Literal string' Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.
{ key:value } Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the key and the value.
<datatype1,datatype2> Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.
cql_statement; End CQL statement. A semicolon ( ; ) terminates all CQL statements.
[ -- ] Separate the command line options from the command arguments with two hyphens ( -- ). This syntax is useful when arguments might be mistaken for command line options.
' <schema> ... </schema> ' Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.
@xml_entity='xml_entity_type' Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrconfig files.

Definition

The short form and long form parameters are comma-separated.

Command arguments

--debug
Display stack traces.
-e, --header-fix argument
Check SSTable serialization-headers and repair issues. Takes the following arguments:
validate-only
Validate serialization-headers only. Do not attempt any repairs and do not continue with the scrub once the validation is complete.
validate
Validate serialization-headers and continue with the scrub once the validation is complete. (Default)
fix-only
Validate and repair only the serialization-headers. Do not continue with the scrub once serialization-header validation and repairs are complete.
fix
Validate and repair serialization-headers and perform a normal scrub. Do not repair and do not continue with the scrub if serialization-header validation encounters errors.
off
Do not perform serialization-header validation checks.
-h, --help
Display the usage and listing of the commands.
-j, --jobs

Number of sstables to scrub simultaneously. Defaults to the minimum between either the number of available processors and 8.

keyspace_name
Keyspace name. Required. Overrides the client_encryption_options in cassandra.yaml.
-m, --manifest-check
Check and repair only the leveled manifest. Do not scrub the SSTables.
-n, --no-validate
Do not validate columns using column validator.
-r, --reinsert-overflowed-ttl
Rewrite rows with overflowed expiration date affected by CASSANDRA-14092 with the maximum supported expiration date of 2038-01-19T03:14:06+00:00. Rows are rewritten with the original timestamp incremented by one millisecond to override/supersede any potential tombstone that might have been generated during compaction of the affected rows. See /en/dse-trblshoot/doc/troubleshooting/recoveringTtlYear2038Problem.html.
-s, --skip-corrupted
Skips corrupt rows in counter tables.
table_name
Table name. Required.
-v,--verbose
Verbose output.

Examples

Verify DataStax Enterprise is not running

nodetool status
Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  10.200.177.92  265.04 KiB  1            ?       980cab6a-2e5d-44c6-b897-0733dde580ac  rack1
DN  10.200.177.94  426.21 KiB  1            ?       7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3  rack1
Restriction: DataStax Enterprise must be stopped before you run this command.

Scrub all SSTables for the calendar table

sstablescrub cycling calendar

Scrub only particular SSTables for the calendar table

sstablescrub cycling calendar --stable-files /var/lib/cassandra/data/cycling/calendar-eebb/ac-1-bti-Data.db \
                                               /var/lib/cassandra/data/cycling/calendar-aacc/ac-2-bti-Data.db