Choosing a compaction strategy
To implement a compaction strategy, follow these steps:
The following questions are based on developer and user experience with the compaction strategies.
- Does your table reside in a DSE 6.8.25+ cluster that uses UnifiedCompactionStrategy (UCS)?
If the answer is yes, or I would like it to, refer to UCS (UnifiedCompactionStrategy). Contact your DataStax account team to discuss using UCS and potentially vertically scaling your nodes. If the answer is no, read the following questions.
- Does your table process time series data?
If the answer is yes, use TWCS (TimeWindowCompactionStrategy). If the answer is no, read the following questions.
- Does your table handle more reads than writes, or more writes than reads?
LCS (LeveledCompactionStrategy) is appropriate if there are twice or more reads than writes, especially randomized reads. If the reads and writes are approximately equal, the performance penalty from LCS may not be worth the benefit. Be aware that LCS can be overwhelmed by a high number of writes. One advantage of LCS is that it keeps related data in a small set of SSTables. However, using a relatively large sstable_max_size (~2GB) for SAI indexed tables can degrade the SAI query performance when the workload experiences increased write amplification and is not dominated by reads. In this instance, consider using UCS (UnifiedCompactionStrategy).
- Does the data in your table change often?
- Do you require predictable levels of read and write activity?
LCS keeps the SSTables within predictable sizes and numbers. For example, if your table’s read and write ratio is small, and the read activity is expected to conform to a Service Level Agreement (SLA), it may be worth the LCS write performance penalty to keep read rates and latency at predictable levels. And, you may be able to overcome the LCS write penalty by adding more nodes.
- Will your table be populated by a batch process?
For batched reads and writes, STCS performs better than LCS. The batch process causes little or no fragmentation, so the benefits of LCS are not realized; batch processes can overwhelm tables that use LCS.
- Does your system have limited disk space?
LCS handles disk space more efficiently than STCS: LCS requires about 10% headroom in addition to the space occupied by the data. In some cases, STCS and DTCS (DateTieredStorageStrategy) require as much as 50% more headroom than the data space. (DTCS is deprecated.)
- Is your system reaching its limits for input and output?
LCS is significantly more input and output intensive than DTCS or STCS. Switching to LCS may introduce extra input and output load that offsets the advantages.
To test the compaction strategy: