When to compress data
Compression is most effective on a table with many rows, where each row contains the same set of columns (or the same number of columns) as all other rows. For example, a table containing user data such as <username>, <email> and <state> is a good candidate for compression. The greater the similarity of the data across rows, the greater the compression ratio and gain in read performance.
A table whose rows contain differing sets of columns is not well-suited for compression.
Depending on the data characteristics of the table, compressing its data can result in:
-
25-33% reduction in data size
-
25-35% performance improvement on reads
-
5-10% performance improvement on writes
After configuring compression on an existing table, subsequently created SSTables are compressed. Existing SSTables on disk are not compressed immediately. DataStax Enterprise compresses existing SSTables when the normal database compaction process occurs. You can force existing SSTables to be rewritten and compressed by using nodetool upgradesstables or nodetool scrub.