Configuring compaction 

Steps for configuring compaction in DataStax Enterprise. The compaction process merges keys, combines columns, evicts tombstones, consolidates SSTables, and creates a new index in the merged SSTable.

As discussed in the How is data maintained?, the compaction process merges keys, combines columns, evicts tombstones, consolidates SSTables, and creates a new index in the merged SSTable.

In the cassandra.yaml file, you configure these global compaction parameters:

The compaction_throughput_mb_per_sec parameter is designed for use with large partitions. The database throttles compaction to this rate across the entire system.

DataStax Enterprise provides a start-up option for testing compaction strategies without affecting the production workload.

DataStax Enterprise supports the following compaction strategies, which you can configure using CQL:

  • LeveledCompactionStrategy (LCS): The leveled compaction strategy creates SSTables of a fixed, relatively small size (160 MB by default) that are grouped into levels. Within each level, SSTables are guaranteed to be non-overlapping. Each level (L0, L1, L2 and so on) is 10 times as large as the previous. Disk I/O is more uniform and predictable on higher than on lower levels as SSTables are continuously being compacted into progressively larger levels. At each level, row keys are merged into non-overlapping SSTables in the next level. This process can improve performance for reads, because the database can determine which SSTables in each level to check for the existence of row key data. This compaction strategy is modeled after Google's LevelDB implementation. Also see LCS compaction subproperties.
  • SizeTieredCompactionStrategy (STCS): The default compaction strategy. This strategy triggers a minor compaction when there are a number of similar sized SSTables on disk as configured by the table subproperty, min_threshold. A minor compaction does not involve all the tables in a keyspace. Also see STCS compaction subproperties.
  • TimeWindowCompactionStrategy (TWCS) This strategy is an alternative for time series data. TWCS compacts SSTables using a series of time windows. While with a time window, TWCS compacts all SSTables flushed from memory into larger SSTables using STCS. At the end of the time window, all of these SSTables are compacted into a single SSTable. Then the next time window starts and the process repeats. The duration of the time window is the only setting required. See TWCS compaction subproperties. For more information about TWCS, see How is data maintained?.
  • DateTieredCompactionStrategy (DTCS) (deprecated).

To configure the compaction strategy property and CQL compaction subproperties, such as the maximum number of SSTables to compact and minimum SSTable size, use CREATE TABLE or ALTER TABLE.

The location of the cassandra.yaml file depends on the type of installation:

Package installations
Installer-Services installations

/etc/dse/cassandra/cassandra.yaml

Tarball installations
Installer-No Services installations

installation_location/resources/cassandra/conf/cassandra.yaml

Procedure

  1. Update a table to set the compaction strategy using the ALTER TABLE statement.
    ALTER TABLE users WITH
      compaction = { 'class' :  'LeveledCompactionStrategy'  }
  2. Change the compaction strategy property to SizeTieredCompactionStrategy and specify the minimum number of SSTables to trigger a compaction using the CQL min_threshold attribute.
    ALTER TABLE users
      WITH compaction =
      {'class' : 'SizeTieredCompactionStrategy', 'min_threshold' : 6 }

Results

You can monitor the results of your configuration using compaction metrics, see Compaction metrics.

What's next

DataStax Enterprise supports extended logging for Compaction. This utility must be configured as part of the table configuration. The extended compaction logs are stored in a separate file. For details, see Enabling extended compaction logging.