Compression
Compression maximizes the storage capacity of DataStax Enterprise (DSE) nodes by reducing the volume of data on disk and disk I/O, particularly for read-dominated workloads. The database quickly finds the location of rows in the SSTable index and decompresses the relevant row chunks. DSE uses a storage engine that dramatically reduces disk volume automatically. See Putting some structure in the storage engine
Write performance is not negatively impacted by compression in DataStax Enterprise as it is in traditional databases. In traditional relational databases, writes require overwrites to existing data files on disk. The database has to locate the relevant pages on disk, decompress them, overwrite the relevant data, and finally recompress. In a relational database, compression is an expensive operation in terms of CPU cycles and disk I/O. Because SSTable data files are immutable (they are not written to again after they have been flushed to disk), there is no recompression cycle necessary in order to process writes. SSTables are compressed only once when they are written to disk. Writes on compressed tables can show up to a 10 percent performance improvement.
In DSE the commit log can also be compressed and write performance can be improved 6-12%. See the Updates to Cassandra’s Commit Log in 2.2 blog.
- When to compress data
-
Compression is best suited for tables that have many rows and each row has the same columns, or at least as many columns, as other rows.
- Configuring compression
-
Steps for configuring compression.