Choosing a compaction strategy

Information on how to select the best compaction strategy.

To implement the chosen compaction strategy:
  1. To understand how compaction and compaction strategies work, read How is data maintained?
  2. Review your application's requirements use this information to answer the questions below.
  3. Configure the table to use the most appropriate strategy.
  4. Test the compaction strategies against your data.

Which compaction strategy is best?

The following questions are based on the experiences of developers and users with the strategies.

Does your table process time series data?
If so, your best choice is Compaction strategiesTWCS. If not, the following questions introduce other considerations to guide your choice.
Does your table handle more reads than writes, or more writes than reads?
LCS is a good choice if your table processes twice as many reads as writes or more — especially randomized reads. If the proportion of reads to writes is closer, the performance hit exacted by LCS may not be worth the benefit. Be aware that LCS can be quickly overwhelmed by a high volume of writes.
Does the data in your table change often?
One advantage of LCS is that it keeps related data in a small set of SSTables. If your data is immutable or not subject to frequent upserts, STCS accomplishes the same type of grouping without the LCS performance hit.
Do you require predictable levels of read and write activity?
LCS keeps the SSTables within predictable sizes and numbers. For example, if your table's read/write ratio is small, and it is expected to conform to a Service Level Agreements (SLAs) for reads, it may be worth taking the write performance penalty of LCS in order to keep read rates and latency at predictable levels. And you may be able to overcome this write penalty through horizontal scaling (adding more nodes).
Will your table be populated by a batch process?
On both batch reads and batch writes, STCS performs better than LCS. The batch process causes little or no fragmentation, so the benefits of LCS are not realized; batch processes can overwhelm LCS-configured tables.
Does your system have limited disk space?
LCS handles disk space more efficiently than STCS: it requires about 10% headroom in addition to the space occupied by the data is handles. STCS and DTCS generally require, in some cases, as much as 50% more than the data space. (DateTieredStorageStrategy (DTCS) is deprecated.)
Is your system reaching its limits for I/O?
LCS is significantly more I/O intensive than DTCS or STCS. Switching to LCS may introduce extra I/O load that offsets the advantages.

Configuring and running compaction

Set the compaction strategy for a table in the parameters for the CREATE TABLE or ALTER TABLE command. For details, see Table properties.

You can start compaction manually using the nodetool compact command.

Testing compaction strategies

Suggestions for determining which compaction strategy is best for your system:
  • Create a three-node cluster using one of the compaction strategies, stress test the cluster using cassandra-stress and measure the results.
  • Set up a node on your existing cluster and use the write survey mode to sample live data.