Compaction subproperties
Constructing a map of the compaction property and its subproperties.
Using CQL, you can configure a table to use SizeTieredCompactionStrategy (STCS), DateTieredCompactionStrategy (DTCS), or LeveledCompactionStrategy (LCS). You can specify a compaction strategy for a new table using the CREATE TABLE command, or change or reconfigure an existing table's strategy using ALTER TABLE. To configure the compaction strategy, construct a map of the compaction property and some of the following subproperties:
Compaction Subproperties | Default | Description |
---|---|---|
bucket_high | 1.5 | Size-tiered compaction merges sets of SSTables that are approximately the same size. Casssandra compares each SSTable size to the average of all SSTable sizes on the node. It merges SSTAbles whose sizes in KB are within [average-size × bucket_low] and [average-size × bucket_high]. |
bucket_low | 0.5 | See above. |
enabled | true | true enables background compaction. See Enabling and disabling background compaction. |
log_all | false | Activates advanced logging for the entire cluster. |
max_threshold | 32 | The maximum number of SSTables to allow in a minor compaction. |
min_threshold | 4 | The minimum number of SSTables to trigger a minor compaction. |
min_sstable_size | 50MB | STCS groups SSTables into buckets. The bucketing process groups SSTables that differ in size by less than 50%. This bucketing process is too fine grained for small SSTables. If your SSTables are small, use min_sstable_size to define a size threshold (in bytes) below which all SSTables belong to one unique bucket. |
tombstone_compaction_interval | 86400 (one day) | The minimum number of seconds after an SSTable is created before Cassandra considers the SSTable for tombstone compaction. Cassandra performs tombstone compaction on an SSTable if the table exceeds the tombstone_threshold ratio. |
tombstone_threshold | 0.2 | The ratio of garbage-collectable tombstones to all contained columns. If the ratio exceeds this limit, Cassandra starts compaction on that table alone, to purge the tombstones. |
unchecked_tombstone_compaction | false | True allows Cassandra to run tombstone compaction without pre-checking which tables are eligible for this operation. Even without this pre-check, Cassandra checks an SSTable to make sure it is safe to drop tombstones. |
Compaction Subproperties | Default | Description |
---|---|---|
base_time_seconds | 3600 (1 hour) | The size of the first time window. |
enabled | true | True enables background compaction. See Enabling and disabling background compaction. |
log_all | false | True activates advanced logging for the entire cluster. |
max_sstable_age_days | 1000 | Cassandra stops considering an SSTable for compaction if all of its data is older than the specified number of days. The value can be a decimal number. This parameter is deprecated. |
max_window_size_seconds | 86400 (24 hours) | The maximum window size in seconds. The default is 1 day. |
max_threshold | 32 | The maximum number of SSTables allowed in a minor compaction. |
min_threshold | 4 | The minimum number of SSTables that trigger a minor compaction. |
timestamp_resolution | MICROSECONDS | Set to MICROSECONDS or MILLISECONDS, to match the timestamp unit of the data you insert |
tombstone_compaction_interval | 864000(ten days) | The minimum number of seconds after an SSTable is created before Cassandra considers the SSTable for tombstone compaction. Cassandra starts tombstone compaction if the SSTable exceeds the tombstone_threshold. |
tombstone_threshold | 0.2 | The ratio of garbage-collectable tombstones to all contained columns. If the ratio exceeds this limit, Cassandra starts compaction on that table alone, to purge the tombstones. |
unchecked_tombstone_compaction | false | True allows Cassandra to run tombstone compaction without pre-checking which tables are eligible for this operation. Even without this pre-check, Cassandra checks an SSTable to make sure it is safe to drop tombstones. |
Compaction Subproperties | Default | Description |
---|---|---|
enabled | true | True enables background compaction. See Enabling and disabling background compaction below. |
log_all | false | True activates advanced logging for the entire cluster. |
sstable_size_in_mb | 160MB | The target size for SSTables that use the Leveled Compaction Strategy. Although SSTable sizes should be less or equal to sstable_size_in_mb, it is possible tthat compaction may produce a larger SSTable during compaction. This occurs when data for a given partition key is exceptionally large. Cassandra does not splie the data into two SSTables. |
tombstone_compaction_interval | 864000 (ten days) | The minimum number of seconds after an SSTable is created before Cassandra considers the SSTable for tombstone compaction. Cassandra begins tombstone compaction SSTable's tombstone_threshold exceeds value of the following property. |
tombstone_threshold | 0.2 | The ratio of garbage-collectable tombstones to all contained columns. If the ratio exceeds this limit, Cassandra starts compaction on that table alone, to purge the tombstones. |
unchecked_tombstone_compaction | false | True allows Cassandra to run tombstone compaction without pre-checking which tables are eligible for this operation. Even without this pre-check, Cassandra checks an SSTable to make sure it is safe to drop tombstones. |
Enabling and disabling background compaction
The following example sets the enable property to disable background compaction:ALTER TABLE mytable WITH COMPACTION = {'class': 'SizeTieredCompactionStrategy', 'enabled': 'false' }
Disabling background compaction can be harmful: without it, Cassandra does not regain disk space, and may allow zombies to propagate. Although compaction uses I/O, it is better to leave it enabled in most cases.
Enabling extended compaction logging
You can configure Casandra to collect in-depth information about
compaction activity on a node, and write it to a dedicated log file. To enable extended
compaction logging, all log-all : true
to the configuration map for any
table.
When extended
compaction is enabled, Cassandra creates a file named compaction-%d.log
(where %d
is a sequential number) in
$CASSANDRA_HOME/logs.
-
type:enable
Lists SSTables that have been flushed previouly
{"type":"enable","keyspace":"test","table":"t","time":1470071098866,"strategies": [ {"strategyId":"0","type":"LeveledCompactionStrategy","tables":[],"repaired":true,"folders": ["/home/carl/oss/cassandra/bin/../data/data"]}, {"strategyId":"1","type":"LeveledCompactionStrategy","tables":[],"repaired":false,"folders": ["/home/carl/oss/cassandra/bin/../data/data"] } ] }
type: flush
Logs a flush event from a memtable to an SSTable on disk, including the CompactionStrategy for each table.
{"type":"flush","keyspace":"test","table":"t","time":1470083335639,"tables": [ {"strategyId":"1","table": {"generation":1,"version":"mb","size":106846362,"details": {"level":0,"min_token":"-9221834874718566760","max_token":"9221396997139245178"} } } ] }
type: compaction
Logs a compaction event.
{"type":"compaction","keyspace":"test","table":"t","time":1470083660267,"start":"1470083660188","end":"1470083660267","input": [ {"strategyId":"1","table": {"generation":1372,"version":"mb","size":1064979,"details": {"level":1,"min_token":"7199305267944662291","max_token":"7323434447996777057"} } } ],"output": [ {"strategyId":"1","table": {"generation":1404,"version":"mb","size":1064306,"details": {"level":2,"min_token":"7199305267944662291","max_token":"7323434447996777057"} } } ] }
type: pending
Lists the number of pending tasks for a compaction strategy
{"type":"pending","keyspace":"test","table":"t","time":1470083447967,"strategyId":"1","pending":200}