DSEFS compression

DSEFS is able to compress files to save storage space and bandwidth. Compression is performed by DSE during upload upon a user’s explicit request. Decompression is transparent. Data is always uncompressed by the server before it is returned to the client.

Compression is performed within block boundaries. The unit of compression—​the chunk of data that gets compressed individually—​is called a frame and its size can be specified during file upload.

Encoders

DSEFS is shipped with the lz4 encoder which works out of the box.

Compression

To compress files use the -c or --compression-encoder parameter for put or cp command. The parameter specifies the compression encoder to use for the file that is about to get uploaded.

dsefs / > put -c lz4 file /path/to/file

The frame size can optionally be set with the -f, --compression-frame-size option.

The maximum frame size in bytes is set in the compression_frame_max_size option in dse.yaml. If a user sets the frame size to a value greater than compression_frame_max_size when using put -f an error will be thrown and the command will fail. Modify the compression_frame_max_size setting based on the available memory of the node.

Files that are compressed can be appended in the same way as uncompressed files. If the file is compressed the appended data gets transparently compressed with the file’s encoder specified for the initial put operation.

Directories can have a default compression encoder specified during directory creation with the mkdir command. Newly added files with the put command inherit the default compression encoder from containing directory. You can override the default compression encoder with the c parameter during put operations.

dsefs / > mkdir -c lz4 /some/path

Decompression

Decompression is performed automatically for all commands that transport data to the client. There is no need for additional configuration to retrieve the original, decompressed file content.

Storage space

Enabling compression creates a distinction between the logical and physical file size.

The logical size is the size of a file before uploading it to DSEFS, where it is then compressed. The logical size is shown by the stat command under Size.

dsefs dsefs://10.0.0.1:5598/ > stat /tmp/wikipedia-sample.bz2
FILE dsefs://10.0.0.1:5598/tmp/wikipedia-sample.bz2:
Owner           none
Group           none
Permission      rwxrwxrwx
Created         2017-04-06 20:06:21+0000
Modified        2017-04-06 20:06:21+0000
Accessed        2017-04-06 20:06:21+0000
Size            7723180
Block size      67108864
Redundancy      3
Compressed      true
Encrypted       false
Comment

The physical size is the actual size of a data stored on the storage device. The physical size is shown by the df command and by the stat -v command for each block separately, under the Compressed length column.

Limitations

Truncating compressed files is not possible.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com