sstablesplit

Splits SSTable files into multiple SSTables of a maximum designated size while offline.

For example, if SizeTieredCompactionStrategy was used for a major compaction and results in an excessively large SSTable, split the table to ensure that compaction occurs before the next huge compaction.

Restriction: Stop DataStax Enterprise before you run this command.

The default location of this SSTable tool depends on the type of installation:

  • Package installations: /usr/bin/

  • Tarball installations: <installation_location>/resources/cassandra/tools/bin

Synopsis

sstablessplit
[--debug] [-h] [--no_snapshot]
[-s <max_size_in_MB>]
<sstable_filepath> [<sstable_filepath> ...]
Syntax conventions Description

UPPERCASE

Literal keyword.

Lowercase

Not literal.

<`Italics>`

Variable value. Replace with a valid option or user-defined value.

[ ]

Optional. Square brackets ( [ ] ) surround optional command arguments. Do not type the square brackets.

( )

Group. Parentheses ( ( ) ) identify a group to choose from. Do not type the parentheses.

|

Or. A vertical bar ( | ) separates alternative elements. Type any one of the elements. Do not type the vertical bar.

...

Repeatable. An ellipsis ( ... ) indicates that you can repeat the syntax element as often as required.

'<Literal string>'

Single quotation ( ' ) marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case.

{ <key>:<value> }

Map collection. Braces ( { } ) enclose map collections or key value pairs. A colon separates the key and the value.

<<datatype1>,<datatype2>>

Set, list, map, or tuple. Angle brackets ( < > ) enclose data types in a set, list, map, or tuple. Separate the data types with a comma.

cql_statement;

End CQL statement. A semicolon ( ; ) terminates all CQL statements.

[ -- ]

Separate the command line options from the command arguments with two hyphens ( -- ). This syntax is useful when arguments might be mistaken for command line options.

' <<schema> ... </schema> >'

Search CQL only: Single quotation marks ( ' ) surround an entire XML schema declaration.

@<xml_entity>='<xml_entity_type>'

Search CQL only: Identify the entity and literal value to overwrite the XML element in the schema and solrconfig files.

Definition

The short form and long form parameters are comma-separated.

Command arguments

--debug

Display stack traces.

-h, --help

Display the usage and listing of the commands.

--no-snapshot

Do not snapshot SSTables before splitting.

-s, --size max_size_in_MB

Maximum size in MB for output SSTables. Default: 50.

sstable_filepath

Filepath to an SSTable.

Examples

Verify DataStax Enterprise is not running

nodetool status
Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  10.200.177.92  265.04 KiB  1            ?       980cab6a-2e5d-44c6-b897-0733dde580ac  rack1
DN  10.200.177.94  426.21 KiB  1            ?       7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3  rack1

Restriction: Stop DataStax Enterprise before you run this command.

Split SSTables to 10 MB

sstablesplit /var/lib/cassandra/data/cycling/cyclist_category-e1f76e21ce4311e8949e33016bf887c0/aa-1-bti-Statistics.db 10
Skipping inexisting file 10
        Skipping /var/lib/cassandra/data/cycling/cyclist_category-e1f76e21ce4311e8949e33016bf887c0/aa-1-bti-Data.db: it's size (0.000 MB) is less than the split size (50 MB)
        No sstables needed splitting.

Some tools require a larger heap size. For example, OutOfMemoryError exceptions may be resolved by increasing the heap size for sstablesplit.

To increase the heap size for sstablesplit to 8GB, change the following line in the tools/bin/sstablesplit shell script from:

MAX_HEAP_SIZE="256M"

to:

MAX_HEAP_SIZE="8G"

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com