Compaction subproperties 

Constructing a map of the compaction property and its subproperties.

Using CQL, you can configure a table to use SizeTieredCompactionStrategy (STCS), DateTieredCompactionStrategy (DTCS), or LeveledCompactionStrategy (LCS). You can specify a compaction strategy for a new table using the CREATE TABLE command, or change or reconfigure an existing table's strategy using ALTER TABLE. To configure the compaction strategy, construct a map of the compaction property and some of the following subproperties:

CQL compaction subproperties for STCS
Compaction Subproperties Default Description
bucket_high 1.5 Size-tiered compaction merges sets of SSTables that are approximately the same size. Casssandra compares each SSTable size to the average of all SSTable sizes on the node. It merges SSTAbles whose sizes in KB are within [average-size × bucket_low] and [average-size × bucket_high].
bucket_low 0.5 See above.
enabled true true enables background compaction. See Enabling and disabling background compaction.
log_all false Activates advanced logging for the entire cluster.
max_threshold 32 The maximum number of SSTables to allow in a minor compaction.
min_threshold 4 The minimum number of SSTables to trigger a minor compaction.
min_sstable_size 50MB STCS groups SSTables into buckets. The bucketing process groups SSTables that differ in size by less than 50%. This bucketing process is too fine grained for small SSTables. If your SSTables are small, use min_sstable_size to define a size threshold (in bytes) below which all SSTables belong to one unique bucket.
only_purge_repaired_tombstones false In Apache Cassandra™ 3.0 and later: true allows purging tombstones only from repaired SSTables. The purpose is to prevent data from resurrecting if repair is not run within gc_grace_seconds. If you do not run repair for a long time, Cassandra keeps all tombstones — this may cause problems.
tombstone_compaction_interval 86400 (one day) The minimum number of seconds after an SSTable is created before Cassandra considers the SSTable for tombstone compaction. Cassandra performs tombstone compaction on an SSTable if the table exceeds the tombstone_threshold ratio.
tombstone_threshold 0.2 The ratio of garbage-collectable tombstones to all contained columns. If the ratio exceeds this limit, Cassandra starts compaction on that table alone, to purge the tombstones.
unchecked_tombstone_compaction false True allows Cassandra to run tombstone compaction without pre-checking which tables are eligible for this operation. Even without this pre-check, Cassandra checks an SSTable to make sure it is safe to drop tombstones.
CQL Compaction subproperties for DTCS
Compaction Subproperties Default Description
base_time_seconds 3600 (1 hour) The size of the first time window.
enabled true True enables background compaction. See Enabling and disabling background compaction.
log_all false True activates advanced logging for the entire cluster.
max_sstable_age_days 1000 Cassandra stops considering an SSTable for compaction if all of its data is older than the specified number of days. The value can be a decimal number. This parameter is deprecated.
max_window_size_seconds 86400 (24 hours) The maximum window size in seconds. The default is 1 day.
max_threshold 32 The maximum number of SSTables allowed in a minor compaction.
min_threshold 4 The minimum number of SSTables that trigger a minor compaction.
timestamp_resolution MICROSECONDS Set to MICROSECONDS or MILLISECONDS, to match the timestamp unit of the data you insert
tombstone_compaction_interval 864000(ten days) The minimum number of seconds after an SSTable is created before Cassandra considers the SSTable for tombstone compaction. Cassandra starts tombstone compaction if the SSTable exceeds the tombstone_threshold.
tombstone_threshold 0.2 The ratio of garbage-collectable tombstones to all contained columns. If the ratio exceeds this limit, Cassandra starts compaction on that table alone, to purge the tombstones.
unchecked_tombstone_compaction false True allows Cassandra to run tombstone compaction without pre-checking which tables are eligible for this operation. Even without this pre-check, Cassandra checks an SSTable to make sure it is safe to drop tombstones.
CQL compaction Subproperties for LCS
Compaction Subproperties Default Description
enabled true True enables background compaction. See Enabling and disabling background compaction below.
log_all false True activates advanced logging for the entire cluster.
sstable_size_in_mb 160MB The target size for SSTables that use the Leveled Compaction Strategy. Although SSTable sizes should be less or equal to sstable_size_in_mb, it is possible tthat compaction may produce a larger SSTable during compaction. This occurs when data for a given partition key is exceptionally large. Cassandra does not splie the data into two SSTables.
tombstone_compaction_interval 864000 (ten days) The minimum number of seconds after an SSTable is created before Cassandra considers the SSTable for tombstone compaction. Cassandra begins tombstone compaction SSTable's tombstone_threshold exceeds value of the following property.
tombstone_threshold 0.2 The ratio of garbage-collectable tombstones to all contained columns. If the ratio exceeds this limit, Cassandra starts compaction on that table alone, to purge the tombstones.
unchecked_tombstone_compaction false True allows Cassandra to run tombstone compaction without pre-checking which tables are eligible for this operation. Even without this pre-check, Cassandra checks an SSTable to make sure it is safe to drop tombstones.

Enabling and disabling background compaction 

The following example sets the enable property to disable background compaction:
ALTER TABLE mytable WITH COMPACTION = {'class': 'SizeTieredCompactionStrategy', 'enabled': 'false' }

Disabling background compaction can be harmful: without it, Cassandra does not regain disk space, and may allow zombies to propagate. Although compaction uses I/O, it is better to leave it enabled in most cases.

Enabling extended compaction logging 

You can configure Casandra to collect in-depth information about compaction activity on a node, and write it to a dedicated log file. To enable extended compaction logging, all log-all : true to the configuration map for any table.

Important: If you enable extended logging for any table on any node, Cassandra enables it for all tables on all nodes in the cluster.

When extended compaction is enabled, Cassandra creates a file named compaction-%d.log (where %d is a sequential number) in $CASSANDRA_HOME/logs.

The compaction logging service logs detailed information about four types of compaction events:
  • type:enable

    Lists SSTables that have been flushed previouly

    {"type":"enable","keyspace":"test","table":"t","time":1470071098866,"strategies":
      [
        {"strategyId":"0","type":"LeveledCompactionStrategy","tables":[],"repaired":true,"folders":
          ["/home/carl/oss/cassandra/bin/../data/data"]},
        {"strategyId":"1","type":"LeveledCompactionStrategy","tables":[],"repaired":false,"folders":
          ["/home/carl/oss/cassandra/bin/../data/data"]
        }
      ]
    }
  • type: flush

    Logs a flush event from a memtable to an SSTable on disk, including the CompactionStrategy for each table.

    {"type":"flush","keyspace":"test","table":"t","time":1470083335639,"tables":
      [
        {"strategyId":"1","table":
          {"generation":1,"version":"mb","size":106846362,"details":
            {"level":0,"min_token":"-9221834874718566760","max_token":"9221396997139245178"}
          }
        }
      ]
    }
    
  • type: compaction

    Logs a compaction event.

    {"type":"compaction","keyspace":"test","table":"t","time":1470083660267,"start":"1470083660188","end":"1470083660267","input":
      [
        {"strategyId":"1","table":
          {"generation":1372,"version":"mb","size":1064979,"details":
            {"level":1,"min_token":"7199305267944662291","max_token":"7323434447996777057"}
          }
        }
      ],"output":
      [
        {"strategyId":"1","table":
          {"generation":1404,"version":"mb","size":1064306,"details":
            {"level":2,"min_token":"7199305267944662291","max_token":"7323434447996777057"}
          }
        }
      ]
    }
    
  • type: pending

    Lists the number of pending tasks for a compaction strategy

    {"type":"pending","keyspace":"test","table":"t","time":1470083447967,"strategyId":"1","pending":200}