DSEFS compression
DSEFS is able to compress files to save storage space and bandwidth. DSE performs compression during file upload upon receiving an explicit user request. Decompression is transparent. Data is always uncompressed by the server before it is returned to the client.
Compression is performed within block boundaries. A unit of compression is known as a frame; the amount of individually compressed data. Specify a frame’s size during file upload.
Encoders
DSEFS ships with the lz4 encoder, which works out of the box.
Compression
To compress files use the -c or --compression-encoder parameter for put command.
The parameter specifies the compression encoder to use for the file that is about to get uploaded.
dsefs / > put -c lz4 file /path/to/file
Optionally, set the frame size with the -f, --compression-frame-size option.
Set the maximum frame size in bytes with the compression_frame_max_size option in dse.yaml.
Setting the frame size to a value greater than compression_frame_max_size when using put -f results in an error, and the command fails.
Modify the compression_frame_max_size setting based on the available memory of the node.
Append to compressed files just as you would to uncompressed files.
Compressed files append data by transparent compression using the same file encoder specified for the initial put operation.
Directories can have a default compression encoder specified during directory creation with the mkdir command.
Newly added files with the put command inherit the default compression encoder from containing directory.
You can override the default compression encoder with the c parameter during put operations.
dsefs / > mkdir -c lz4 /some/path
Decompression
Automatic decompression occurs for all commands that transport data to the client. There is no need for additional configuration to retrieve the original, decompressed file content.
Storage space
Enabling compression creates a distinction between the logical and physical file size.
The logical size is the size of a file before uploading it to DSEFS, where it is then compressed.
Show the logical size with the stat command.
Look for results in Size.
dsefs dsefs://10.0.0.1:5598/ > stat /tmp/wikipedia-sample.bz2
FILE dsefs://10.0.0.1:5598/tmp/wikipedia-sample.bz2:
Owner none
Group none
Permission rwxrwxrwx
Created 2017-04-06 20:06:21+0000
Modified 2017-04-06 20:06:21+0000
Accessed 2017-04-06 20:06:21+0000
Size 7723180
Block size 67108864
Redundancy 3
Compressed true
Encrypted false
Comment
The physical size is the actual size of a data stored on the storage device.
Show the physical size with the df command and the stat -v command for each block separately.
Look for results in the Compressed length column.
Limitations
Truncating compressed files is not possible.