Choosing a compaction strategy
To implement a compaction strategy, follow these steps:
-
Read how data is maintained to understand the compaction strategies.
-
Answer the questions below to determine the appropriate compaction strategy for each table.
-
Configure each table to use the appropriate compaction strategy.
-
Test the compaction strategy with your data.
Which compaction strategy is best?
The following questions are based on developer and user experience with the compaction strategies.
- Does your table reside in a DSE 6.9 cluster that uses UnifiedCompactionStrategy (UCS)?
-
If the answer is yes, or I would like it to, refer to UCS (UnifiedCompactionStrategy). Contact your DataStax account team to discuss using UCS and potentially vertically scaling your nodes. If the answer is no, read the following questions.
- Does your table process time series data?
-
If the answer is yes, use TWCS (TimeWindowCompactionStrategy). If the answer is no, read the following questions.
- Does your table handle more reads than writes, or more writes than reads?
-
LCS (LeveledCompactionStrategy) is appropriate if there are twice or more reads than writes, especially randomized reads. If the reads and writes are approximately equal, the performance penalty from LCS may not be worth the benefit. Be aware that LCS can be overwhelmed by a high number of writes. One advantage of LCS is that it keeps related data in a small set of SSTables. However, using a relatively large sstable_max_size (~2GB) for SAI indexed tables can degrade the SAI query performance when the workload experiences increased write amplification and is not dominated by reads. In this instance, consider using UCS (UnifiedCompactionStrategy).
- Does the data in your table change often?
-
If your data is immutable or there are few upserts, use STCS (SizeTieredCompactionStrategy), which does not have the write performance penalty of LCS.
- Do you require predictable levels of read and write activity?
-
LCS keeps the SSTables within predictable sizes and numbers. For example, if your table’s read and write ratio is small, and the read activity is expected to conform to a Service Level Agreement (SLA), it may be worth the LCS write performance penalty to keep read rates and latency at predictable levels. And, you may be able to overcome the LCS write penalty by adding more nodes.
- Will your table be populated by a batch process?
-
For batched reads and writes, STCS performs better than LCS. The batch process causes little or no fragmentation, so the benefits of LCS are not realized; batch processes can overwhelm tables that use LCS.
- Does your system have limited disk space?
-
LCS handles disk space more efficiently than STCS: LCS requires about 10% headroom in addition to the space occupied by the data. In some cases, STCS and DTCS (DateTieredStorageStrategy) require as much as 50% more headroom than the data space. (DTCS is deprecated.)
- Is your system reaching its limits for input and output?
-
LCS is significantly more input and output intensive than DTCS or STCS. Switching to LCS may introduce extra input and output load that offsets the advantages.
Configuring and running compaction
Set the table compaction strategy in the CREATE TABLE or ALTER TABLE statement parameters. See CQL table properties.
You can start compaction manually using the nodetool compact command.
Testing compaction strategies
To test the compaction strategy:
-
Create a three-node cluster using one of the compaction strategies, then stress test the cluster using the cassandra-stress utility and measure the results.
-
Set up a node on your existing cluster and enable the write survey mode option on the node to analyze live data.