DSEFS compression
DSEFS is able to compress files to save storage space and bandwidth.
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
DSEFS is able to compress files to save storage space and bandwidth. Compression is performed by DSE during upload upon a user’s explicit request. Decompression is transparent. Data is always uncompressed by the server before it is returned to the client.
Compression is performed within block boundaries. The unit of compression—the chunk of data that gets compressed individually—is called a frame and its size can be specified during file upload.
Encoders
DSEFS is shipped with the lz4 encoder which works out of the box.
Compression
To compress files use the -c
or --compression-encoder
parameter for put or cp command. The parameter
specifies the compression encoder to use for the file that is about to get uploaded.
dsefs / > put -c lz4 file /path/to/file
The frame size can optionally be set with the -f, --compression-frame-size
option.
The maximum frame size in bytes is set in the compression_frame_max_size
option in dse.yaml. If a user sets the frame size to a
value greater than compression_frame_max_size
when using put
-f
an error will be thrown and the command will fail. Modify the
compression_frame_max_size
setting based on the available memory of the
node.
Files that are compressed can be appended in the same way as uncompressed files. If the
file is compressed the appended data gets transparently compressed with the file's encoder
specified for the initial put
operation.
Directories can have a default compression encoder specified during directory creation with
the mkdir command. Newly added files with the
put
command inherit the default compression encoder from containing
directory. You can override the default compression encoder with the c
parameter during put
operations.
dsefs / > mkdir -c lz4 /some/path
Decompression
Decompression is performed automatically for all commands that transport data to the client. There is no need for additional configuration to retrieve the original, decompressed file content.
Storage space
Enabling compression creates a distinction between the logical and physical file size.
The logical size is the size of a file before uploading it to DSEFS, where it is then
compressed. The logical size is shown by the stat
command under
Size.
dsefs dsefs://10.0.0.1:5598/ > stat /tmp/wikipedia-sample.bz2 FILE dsefs://10.0.0.1:5598/tmp/wikipedia-sample.bz2: Owner none Group none Permission rwxrwxrwx Created 2017-04-06 20:06:21+0000 Modified 2017-04-06 20:06:21+0000 Accessed 2017-04-06 20:06:21+0000 Size 7723180 Block size 67108864 Redundancy 3 Compressed true Encrypted false Comment
The physical size is the actual size of a data stored on the storage device. The physical
size is shown by the df
command and by
the stat -v
command for each block separately, under the Compressed length
column.
Limitations
Truncating compressed files is not possible.