DSEFS compression
DSEFS is able to compress files to save storage space and bandwidth. Compression is performed by DSE during upload upon a user’s explicit request. Decompression is transparent. Data is always uncompressed by the server before it is returned to the client.
Compression is performed within block boundaries. The unit of compression—the chunk of data that gets compressed individually—is called a frame and its size can be specified during file upload.
Encoders
DSEFS is shipped with the lz4 encoder which works out of the box.
Compression
To compress files use the -c
or --compression-encoder
parameter for put or cp command.
The parameter specifies the compression encoder to use for the file that is about to get uploaded.
dsefs / > put -c lz4 file /path/to/file
The frame size can optionally be set with the -f, --compression-frame-size
option.
The maximum frame size in bytes is set in the compression_frame_max_size
option in dse.yaml
.
If a user sets the frame size to a value greater than compression_frame_max_size
when using put -f
an error will be thrown and the command will fail.
Modify the compression_frame_max_size
setting based on the available memory of the node.
Files that are compressed can be appended in the same way as uncompressed files.
If the file is compressed the appended data gets transparently compressed with the file’s encoder specified for the initial put
operation.
Directories can have a default compression encoder specified during directory creation with the mkdir command.
Newly added files with the put
command inherit the default compression encoder from containing directory.
You can override the default compression encoder with the c
parameter during put
operations.
dsefs / > mkdir -c lz4 /some/path
Decompression
Decompression is performed automatically for all commands that transport data to the client. There is no need for additional configuration to retrieve the original, decompressed file content.
Storage space
Enabling compression creates a distinction between the logical and physical file size.
The logical size is the size of a file before uploading it to DSEFS, where it is then compressed.
The logical size is shown by the stat
command under Size.
dsefs dsefs://10.0.0.1:5598/ > stat /tmp/wikipedia-sample.bz2
FILE dsefs://10.0.0.1:5598/tmp/wikipedia-sample.bz2:
Owner none
Group none
Permission rwxrwxrwx
Created 2017-04-06 20:06:21+0000
Modified 2017-04-06 20:06:21+0000
Accessed 2017-04-06 20:06:21+0000
Size 7723180
Block size 67108864
Redundancy 3
Compressed true
Encrypted false
Comment
The physical size is the actual size of a data stored on the storage device.
The physical size is shown by the df
command and by the stat -v
command for each block separately, under the Compressed length column.
Limitations
Truncating compressed files is not possible.