Choosing a compaction strategy
How to choose the best compaction strategy.
To implement a compaction strategy, follow these steps:
- Read how data is maintained to understand the compaction strategies.
- Answer the questions below to determine the appropriate compaction strategy for each table.
- Configure each table to use the appropriate compaction strategy.
- Test the compaction strategy with your data.
Which compaction strategy is best?
The following questions are based on developer and user experience with the compaction strategies.
- Does your table process time series data?
- If the answer is yes, use TWCS (TimeWindowCompactionStrategy). If the answer is no, read the following questions.
- Does your table handle more reads than writes, or more writes than reads?
- LCS (LeveledCompactionStrategy) is appropriate if there are twice or more reads than writes, especially randomized reads. If the reads and writes are approximately equal, the performance penalty from LCS may not be worth the benefit. Be aware that LCS can be overwhelmed by a high number of writes. One advantage of LCS is that it keeps related data in a small set of SSTables.
- Does the data in your table change often?
- If your data is immutable or there are few upserts, use STCS (SizeTieredCompactionStrategy), which does not have the write performance penalty of LCS.
- Do you require predictable levels of read and write activity?
- LCS keeps the SSTables within predictable sizes and numbers. For example, if your table's read and write ratio is small, and the read activity is expected to conform to a Service Level Agreement (SLA), it may be worth the LCS write performance penalty to keep read rates and latency at predictable levels. And, you may be able to overcome the LCS write penalty by adding more nodes.
- Will your table be populated by a batch process?
- For batched reads and writes, STCS performs better than LCS. The batch process causes little or no fragmentation, so the benefits of LCS are not realized; batch processes can overwhelm tables that use LCS.
- Does your system have limited disk space?
- LCS handles disk space more efficiently than STCS: LCS requires about 10% headroom in addition to the space occupied by the data. In some cases, STCS and DTCS (DateTieredStorageStrategy) require as much as 50% more headroom than the data space. (DTCS is deprecated.)
- Is your system reaching its limits for input and output?
- LCS is significantly more input and output intensive than DTCS or STCS. Switching to LCS may introduce extra input and output load that offsets the advantages.
Configuring and running compaction
Set the table compaction strategy in the CREATE TABLE or ALTER TABLE statement parameters. See .
You can start compaction manually using the nodetool compact command.
Testing compaction strategies
To test the compaction strategy:
- Create a three-node cluster using one of the compaction strategies, then stress test the cluster using thecassandra-stress utility and measure the results.
- Set up a node on your existing cluster and enable the write survey mode option on the node to analyze live data.