sstablesplit

Splits SSTable files into multiple SSTables of a maximum designated size while offline.

For example, if SizeTieredCompactionStrategy was used for a major compaction and results in an excessively large SSTable, split the table to ensure that compaction occurs before the next huge compaction.

Stop DSE before you run this command.

The default location of this SSTable tool depends on the type of installation:

  • Package installations: /usr/bin/

  • Tarball installations: INSTALL_DIRECTORY/resources/cassandra/tools/bin

Synopsis

sstablessplit
[--debug] [-h] [--no_snapshot]
[-s <max_size_in_MB>]
<sstable_filepath> [<sstable_filepath> ...]
Syntax legend
Syntax conventions Description

Italic, bold, or < >

Syntax diagrams and code samples use one or more of these styles to mark placeholders for variable values. Replace placeholders with a valid option or your own user-defined value.

In CQL statements, angle brackets are required to enclose data types in a set, list, map, or tuple. Separate the data types with a comma. For example: <datatype2

In Search CQL statements, angle brackets are used to identify the entity and literal value to overwrite the XML element in the schema and solrconfig files, such as @<xml_entity>='<xml_entity_type>'.

[ ]

Square brackets surround optional command arguments. Do not type the square brackets.

( )

Parentheses identify a group to choose from. Do not type the parentheses.

|

A pipe separates alternative elements. Type any one of the elements. Do not type the pipe.

...

Indicates that you can repeat the syntax element as often as required.

'

Single quotation marks must surround literal strings in CQL statements. Use single quotation marks to preserve upper case. + For Search CQL only: Single quotation marks surround an entire XML schema declaration, such as '<<schema> ... </schema>>'

{ }

Map collection. Curly braces enclose maps ({ <key_datatype>:<value_datatype> }) or key value pairs ({ <key>:<value> }). A colon separates the key and the value.

;

Ends a CQL statement.

--

Separate command line options from command arguments with two hyphens. This syntax is useful when arguments might be mistaken for command line options.

Options

If an option has a short and long form, both forms are given, separated by a comma.

--debug

Display stack traces.

-h, --help

Display the usage and listing of the commands.

--no-snapshot

Do not snapshot SSTables before splitting.

-s, --size max_size_in_MB

Maximum size in MB for output SSTables. Default: 50

sstable_filepath

Filepath to an SSTable.

Examples

Verify DataStax Enterprise is not running

nodetool status
Datacenter: Graph
================================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens       Owns    Host ID                               Rack
UN  10.200.177.92  265.04 KiB  1            ?       980cab6a-2e5d-44c6-b897-0733dde580ac  rack1
DN  10.200.177.94  426.21 KiB  1            ?       7ecbbc0c-627d-403e-b8cc-a2daa93d9ad3  rack1

Split SSTables to 10 MB

sstablesplit /var/lib/cassandra/data/cycling/cyclist_category-e1f76e21ce4311e8949e33016bf887c0/aa-1-bti-Statistics.db 10
Skipping inexisting file 10
        Skipping /var/lib/cassandra/data/cycling/cyclist_category-e1f76e21ce4311e8949e33016bf887c0/aa-1-bti-Data.db: it's size (0.000 MB) is less than the split size (50 MB)
        No sstables needed splitting.

Some tools require a larger heap size. For example, OutOfMemoryError exceptions may be resolved by increasing the heap size for sstablesplit.

To increase the heap size for sstablesplit to 8GB, change the following line in the tools/bin/sstablesplit shell script from:

MAX_HEAP_SIZE="256M"

to:

MAX_HEAP_SIZE="8G"

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2025 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM