Configuring compaction
Steps for configuring compaction. The compaction process merges keys, combines columns, evicts tombstones, consolidates SSTables, and creates a new index in the merged SSTable.
As discussed in the Compaction topic, the compaction process merges keys, combines columns, evicts tombstones, consolidates SSTables, and creates a new index in the merged SSTable.
In the cassandra.yaml file, you configure these global compaction parameters:
The compaction_throughput_mb_per_sec parameter is designed for use with large partitions because compaction is throttled to the specified total throughput across the entire system.
Cassandra provides a start-up option for testing compaction strategies without affecting the production workload.
Using CQL, you configure a compaction strategy:
- SizeTieredCompactionStrategy (STCS): The default compaction strategy. This strategy triggers a minor compaction when there are a number of similar sized SSTables on disk as configured by the table subproperty, min_threshold. A minor compaction does not involve all the tables in a keyspace. Also see STCS compaction subproperties.
- DateTieredCompactionStrategy (DTCS): This strategy is
particularly useful for time series data. DateTieredCompactionStrategy stores
data written within a certain period of time in the same SSTable. For example,
Cassandra can store your last hour of data in one SSTable time window,
and the next 4 hours of data in another time window, and so on. Compactions are
triggered when the min_threshold (4 by default) for SSTables in those windows is
reached. The most common queries for time series workloads retrieve the last
hour/day/month of data. Cassandra can limit SSTables returned to those having
the relevant data. Also, Cassandra can store data that has been set to expire
using TTL in an SSTable with other data scheduled to expire at approximately the
same time. Cassandra can then drop the SSTable without doing any compaction.
Also see DTCS compaction subproperties and
DateTieredCompactionStrategy: Compaction for
Time Series Data.Note: Disabling read repair when using DTCS is recommended. Use full repair as necessary.
- LeveledCompactionStrategy (LCS): The leveled compaction strategy creates SSTables of a fixed, relatively small size (160 MB by default) that are grouped into levels. Within each level, SSTables are guaranteed to be non-overlapping. Each level (L0, L1, L2 and so on) is 10 times as large as the previous. Disk I/O is more uniform and predictable on higher than on lower levels as SSTables are continuously being compacted into progressively larger levels. At each level, row keys are merged into non-overlapping SSTables. This can improve performance for reads, because Cassandra can determine which SSTables in each level to check for the existence of row key data. This compaction strategy is modeled after Google's leveldb implementation. Also see LCS compaction subproperties.
To configure the compaction strategy property and CQL compaction subproperties, such as the maximum number of SSTables to compact and minimum SSTable size, use CREATE TABLE or ALTER TABLE.
Package installations | /etc/cassandra/cassandra.yaml |
Tarball installations | install_location/resources/cassandra/conf/cassandra.yaml |
Procedure
-
Update a table to set the compaction strategy using the ALTER TABLE
statement.
ALTER TABLE users WITH compaction = { 'class' : 'LeveledCompactionStrategy' }
-
Change the compaction strategy property to
SizeTieredCompactionStrategy and specify the minimum number of SSTables to
trigger a compaction using the CQL min_threshold attribute.
ALTER TABLE users WITH compaction = {'class' : 'SizeTieredCompactionStrategy', 'min_threshold' : 6 }
Results
You can monitor the results of your configuration using compaction metrics, see Compaction metrics.