Compression

Compression maximizes the storage capacity of DataStax Enterprise (DSE) nodes by reducing the volume of data on disk and disk I/O, particularly for read-dominated workloads. The database quickly finds the location of rows in the SSTable index and decompresses the relevant row chunks. DSE uses a storage engine that dramatically reduces disk volume automatically. See Putting some structure in the storage engine.

Write performance is not negatively impacted by compression in DSE as it is in traditional databases. In traditional relational databases, writes require overwrites to existing data files on disk. The database has to locate the relevant pages on disk, decompress them, overwrite the relevant data, and finally recompress. In a relational database, compression is an expensive operation in terms of CPU cycles and disk I/O. Because SSTable data files are immutable (they are not written to again after they have been flushed to disk), there is no recompression cycle necessary in order to process writes. SSTables are compressed only once when they are written to disk. Writes on compressed tables can show up to a 10 percent performance improvement.

In DSE the commit log can also be compressed and write performance can be improved 6-12%. See the Updates to Cassandra’s Commit Log in 2.2 blog.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com