DSEFS command line tool

Options and command arguments for the DSE File System (DSEFS).

The DSEFS functionality supports operations including uploading, downloading, moving, and deleting files, creating directories, and verifying the DSEFS status.

DSEFS commands are available only in the logical datacenter. DSEFS works with secured and unsecured clusters, see DSEFS authentication.

You can interact with the DSEFS file system in several modes: interactive command line shell, as part of dse commands, or with a REST API.

Interactive DSEFS command line shell

To use the interactive command line shell:
Action Command line
Launch DSEFS shell
dse fs
dsefs / >
The DSEFS prompt shows the current working directory on DSEFS. The current local working directory that you launch DSEFS from is the default directory that is used for searching local files.
Launch DSEFS shell with precedence given to the specified hosts
dse fs --prefer-contact-points -h 10.0.0.2,10.0.0.5

The --prefer-contact-points is used in conjunction with the -h option to give precedence to the specified hosts, regardless of proximity, when issuing DSEFS commands. As long as the specified hosts are available, DSEFS will not switch to other DSEFS nodes in the cluster.

Without the --prefer-contact-points option, DSEFS will switch to the closest available DSEFS node automatically, even if the -h option is used to specify contact points.

View entire DSEFS command list
dsefs / > help
View help for any DSEFS command
dsefs / > help dsefs_command
Add a comment to a DSEFS shell command Use the # character. Everything after the # character will be ignored.
dsefs / > get archive.tgz local_archive.tgz #retrieve the archive
Exit DSEFS shell Press Ctrl+D or type exit

Configuring DSEFS shell logging

The default location of the DSEFS shell log file .dsefs-shell.log is the user home directory. The default log level is INFO. To configure DSEFS shell logging, edit the installation_location/resources/dse/conf/logback-dsefs-shell.xml file.

Using with the dse command line

Precede the DSEFS command with dse fs:
dse [dse_auth_credentials] fs dsefs_command  [options]
For example, to list the file system status and disk space usage in human-readable format:
dse -u user1 -p mypassword fs "df -h"

Optional command arguments are enclosed in square brackets. For example, [dse_auth_credentials] and [-R]

Variable values are italicized. For example, directory and [subcommand].

Working with the local file system in the DSEFS shell

You can refer to files in the local file system by prefixing paths with file:. For example the following command will list files in the system root directory:

dsefs dsefs://127.0.0.1:5598/ > ls file:/
bin   cdrom  dev  home  lib32  lost+found  mnt  proc  run   srv  tmp  var         initrd.img.old  vmlinuz.old  
boot  data   etc  lib   lib64  media       opt  root  sbin  sys  usr  initrd.img  vmlinuz         

If you need to perform many subsequent operations on the local file system, first change the current working directory to file: or any local file system path:

dsefs dsefs://127.0.0.1:5598/ > cd file:
dsefs file:/home/user1/path/to/local/files > ls
conf  src  target  build.sbt  
dsefs file:/home/user1/path/to/local/files > cd ..
dsefs file:/home/user1/path/to/local > 

DSEFS shell remembers the last working directory of each file system separately. To go back to the previous DSEFS directory, enter:

dsefs file:/home/user1/path/to/local/files > cd dsefs:
dsefs dsefs://127.0.0.1:5598/ > 

To go back again to the previous local directory:

dsefs dsefs://127.0.0.1:5598/ > cd file:
dsefs file:/home/user1/path/to/local/files >

To refer to a path relative to the last working directory of the file system, prefix a relative path with either dsefs: or file:. The following session will create a directory new_directory in the directory /home/user1:

dsefs dsefs://127.0.0.1:5598/ > cd file:/home/user1
dsefs file:/home/user1 > cd dsefs:
dsefs dsefs://127.0.0.1:5598/ > mkdir file:new_directory
dsefs dsefs://127.0.0.1:5598/ > realpath file:new_directory
file:/home/user1/new_directory
dsefs dsefs://127.0.0.1:5598/ > stat file:new_directory
DIRECTORY file:/home/user1/new_directory:
Owner           user1
Group           user1
Permission      rwxr-xr-x
Created         2017-01-15 13:10:06+0200
Modified        2017-01-15 13:10:06+0200
Accessed        2017-01-15 13:10:06+0200
Size            4096

To copy a file between two different file systems, you can also use the cp command with explicit file system prefixes in the paths:

dsefs file:/home/user1/test > cp dsefs:archive.tgz another-archive-copy.tgz
dsefs file:/home/user1/test > ls
another-archive-copy.tgz archive-copy.tgz  archive.tgz  

Authentication

For dse dse_auth_credentials you can provide user credentials in several ways, see Providing credentials from DSE tools. For authentication with DSEFS, see DSEFS authentication.

Wildcard support

Some DSEFS commands support wildcard pattern expansion in the path argument. Path arguments containing wildcards are expanded before method invocation into a set of paths matching the wildcard pattern, then the given method is invoked for each expanded path.

For example in the following directory tree:

dirA
|--dirB
|--file1
|--file2

Giving the stat dirA/* command would be transparently translated into three invocations: stat dirA/dirB, stat dirA/file1, and stat dirA/file2.

DSEFS supports the following wildcard patterns:

  • * matches any files system entry (file or directory) name, as in the example of stat dirA/*.
  • ? matches any single character in the file system entry name. For example stat dirA/dir? matches dirA/dirB.
  • [] matches any characters enclosed within the brackets. For example stat dirA/file[0123] matches dirA/file1 and dirA/file2.
  • {} matches any sequence of characters enclosed within the brackets and separated with ,. For example stat dirA/{dirB,file2} matches dirA/dirB and dirA/file2.

There are no limitations on the number of wildcard patterns in a single path.

For authentication with DSEFS, see DSEFS authentication.

Executing multiple commands

DSEFS can execute multiple commands on one line. Use quotes around the commands and arguments. Each command will be executed separately by DSEFS.

dse fs 'cat file1 file2 file3 file4' 'ls dir1'

Forcing synchronization

Before confirming writing a file, DSEFS by default forces all blocks of the file to be written to the storage devices. This behavior can be controlled with --no-force-sync and --force-fsync flags when creating files or directories in the DSEFS shell with mkdir, put, and cp commands. The force/no-force behavior is inherited from the parent directory, if not specified. For example, if a directory is created with --no-force-sync, then all files are created with --no-force-sync unless --force-fsync is explicitly set during file creation.

Turning off forced synchronization improves latency and performance at a cost of durability. For example, if a power loss occurs before writing the data to the storage device, you may lose data. Turn off forced synchronization only if you have a reliable backup power supply in your datacenter and failure of all replicas is unlikely, or if you can afford losing file data.

The Hadoop SYNC_BLOCK flag has the same effect as --force-sync in DSEFS. The Hadoop LAZY_PERSIST flag has the same effect as --no-force-sync in DSEFS.

DSEFS command options

The following DSEFS commands and arguments are supported:
DSEFS command Description and command arguments
append source destination Append a local file to a remote file.
  • source is the path to the local file to read data from.
  • destination is the path to the remote file to append the file to.
cat file_or_files Concatenate files and print on the standard output.
  • file_or_files is the file or files in DSEFS to print to standard output. Separate files with a space. The path may contain wildcards.
cd directory Change the remote working directory in DSEFS.
  • directory is the remote directory to change to. The path may contain wildcards.
  • .. is the parent directory.
The DSEFS prompt identifies the current working directory in DSEFS:
  • dsefs / > is the default directory
  • dsefs /dir2 > is the current working directory dir2
chgrp [options] group path Change file or directory group ownership.
  • -r, -R recursively changes the file and directory group ownership.
  • -v explains in more detail what is being done.
  • group the new group name.
  • path the file or directory whose group will be changed. The path may contain wildcards.
chmod [options] octal permission mode path Change the permissions of a file or directory.
  • -r, -R recursively changes the file and directory permissions.
  • -v explains in more detail what is being done.
  • octal permission mode octal representation of permission mode for owner, group, and others.
  • path the file or directory whose permissions will be changed. The path may contain wildcards.
chown [options] path Change files or directories ownership and/or group ownership.
  • -r, -R recursively changes the file and directory ownership.
  • -v explains in more detail what is being done.
  • -u, --user username the new owner username.
  • -g, --group group the new group owner name.
  • path the file or directory whose ownership will be changed. The path may contain wildcards.
cp [options] source destination Copies a file within a file system or between two file systems. If the destination path points to a different file system than DSEFS, the block size and redundancy options are ignored.
  • -o, --overwrite overwrite the destination file if it exists.
  • -b, --block-size value The preferred block size in bytes.
  • --force-fsync forces synchronization of the file data to the storage devices, regardless of the value of the --force-sync flag of its parent directory. The value of the --force-sync flag of a file or directory can be displayed with stat. This option applies to a directory's files, not the directory itself, similar to behavior of block size or compression options. It can be applied during directory creation so that files in the directory will inherit the option.
  • --no-force-fsync causes the blocks of data to be written only to the buffers of the operating system, and syncing them to the storage device is then left to the operating system. Using --no-force-fsync improves latency and performance at the cost of durability. If a power loss occurs before writing the data to the storage device, you may lose data. Use --no-force-fsync only if you have a reliable backup power supply in your datacenter and failure of all replicas is unlikely, or if you can afford losing file data.

    Using the --no-force-fsync flag on a directory will by default make any new files created in this directory as if the --no-force-fsync flag was given when creating the files.

  • -n, --redundancy-factor num_nodes is how many replicas of the file data to create in DSEFS. This redundancy factor is similar to the replication factor in the database keyspaces, but is more granular. Set this value to one number greater than the number of nodes that are allowed to fail before data loss occurs. For example, set this value to 3 to allow 2 nodes to fail. For simple replication, you can use a value that is equivalent to the replication factor.
  • source the source file to be copied. The path may contain wildcards.
  • destination the destination file to be created.
df [options] List the DSEFS file system status and disk space usage.
  • -h to list the sizes in human-readable format. Sizes are rounded to three significant places and presented using units:
    • K (for a kilobyte = 1024 bytes),
    • M (for a megabyte = 1024K),
    • G (for a gigabyte = 1024M),
    • T (for a terabyte = 1024G)
    Without this option sizes are printed in bytes.
echo Outputs a line of text to display the quoted strings it is being passed as a single argument:
echo "some text"  
Output a line of text to display the strings it is being passed as multiple arguments:
echo Getting a file  
exit Exit the DSEFS shell client. You can also type Ctrl+D to exit the shell.
fsck Perform a file system consistency check and repair file system errors.
get source destination Get a file from the DSEFS remote file system and copy the file to the local file system.
  • source is the path to the DSEFS remote file to copy.
  • destination is the path to the local file to create.
ls [options] [file_system_entry_or_entries] List the DSEFS file system entries (files or directories) in the current working directory.
  • -R to list subdirectories recursively.
  • -l to use a long listing format with metadata.
  • -h to list the sizes in human-readable format. Sizes are rounded to three significant places and presented using units:
    • K (for a kilobyte = 1024 bytes),
    • M (for a megabyte = 1024K),
    • G (for a gigabyte = 1024M),
    • T (for a terabyte = 1024G)
    Without this option sizes are printed in bytes.
  • -1 limits the number of printed columns to one, so one file is printed per line. This allows the output to more easily be parsed by external tools.
  • file_system_entry_or_entries is the directory or directories to list the contents of. The path may contain wildcards.
mkdir [options] dir_or_dirs Make a new directory or directories.
  • -p to make parent directories as needed.
  • -b bytes is the preferred block size for files stored in this directory.
  • --force-fsync forces synchronization of the file data to the storage devices, regardless of the value of the --force-sync flag of its parent directory. The value of the --force-sync flag of a file or directory can be displayed with stat. This option applies to a directory's files, not the directory itself, similar to behavior of block size or compression options. It can be applied during directory creation so that files in the directory will inherit the option.
  • --no-force-fsync causes the blocks of data to be written only to the buffers of the operating system, and syncing them to the storage device is then left to the operating system. Using --no-force-fsync improves latency and performance at the cost of durability. If a power loss occurs before writing the data to the storage device, you may lose data. Use --no-force-fsync only if you have a reliable backup power supply in your datacenter and failure of all replicas is unlikely, or if you can afford losing file data.

    Using the --no-force-fsync flag on a directory will by default make any new files created in this directory as if the --no-force-fsync flag was given when creating the files.

  • -c, --compression-encoder value the encoder name to use for compression. DSE ships with the lz4 compression encoder.
  • -n, --redundancy-factor num_nodes is how many replicas of the file data to create in DSEFS. This redundancy factor is similar to the replication factor in the database keyspaces, but is more granular. Set this value to one number greater than the number of nodes that are allowed to fail before data loss occurs. For example, set this value to 3 to allow 2 nodes to fail. For simple replication, you can use a value that is equivalent to the replication factor.
  • -m, --permission-mode value octal representation of permission mode for owner, group and others.
  • dir_or_dirs is the directory or directories to create.
mv source destination Move or rename a file or directory.
  • source is the path to the DSEFS file system entry to be moved. The path may contain wildcards.
  • The destination path on DSEFS:
    • destination is the full destination path, including the name of the file or directory being moved.
    • destination/ is the full destination path. If the destination ends with a slash (/) the original file or directory name will be retained.
put [options] source destination Copy a local file to the DSEFS.
  • -o, --overwrite to overwrite the destination file if it exists.
  • -b, --block-size bytes is the preferred block size in bytes.
  • --force-fsync forces synchronization of the file data to the storage devices, regardless of the value of the --force-sync flag of its parent directory. The value of the --force-sync flag of a file or directory can be displayed with stat. This option applies to a directory's files, not the directory itself, similar to behavior of block size or compression options. It can be applied during directory creation so that files in the directory will inherit the option.
  • --no-force-fsync causes the blocks of data to be written only to the buffers of the operating system, and syncing them to the storage device is then left to the operating system. Using --no-force-fsync improves latency and performance at the cost of durability. If a power loss occurs before writing the data to the storage device, you may lose data. Use --no-force-fsync only if you have a reliable backup power supply in your datacenter and failure of all replicas is unlikely, or if you can afford losing file data.

    Using the --no-force-fsync flag on a directory will by default make any new files created in this directory as if the --no-force-fsync flag was given when creating the files.

  • -c, --compression-encoder value the encoder name to use for compression. DSE ships with the lz4 compression encoder.
  • -n, --redundancy-factor num_nodes is how many replicas of the file data to create in DSEFS. This redundancy factor is similar to the replication factor in the database keyspaces, but is more granular. Set this value to one number greater than the number of nodes that are allowed to fail before data loss occurs. For example, set this value to 3 to allow 2 nodes to fail. For simple replication, you can use a value that is equivalent to the replication factor.
  • -f, --compression-frame-size value the preferred frame size in bytes. Frame is a subject of compression. The bigger the frame the bigger the chance for high compression ratio. For most cases the default value of 131072 bytes is sufficient.
  • -m, --permission-mode value octal representation of permission mode for owner, group and others.
  • source is the path to the local source file.
  • destination is the path to the destination file to be created on DSEFS.
pwd [path] Print the working directory of the current file system or specified path.
  • path the current working directory of the file system at the root of the path.
realpath [options] path Print the resolved absolute path for a specified file or directory.
  • -e, --canonicalize-existing all components of the path must exist.
  • -m, --canonicalize-missing no path components need to exist or be a directory.
  • path the path to resolve.
rename path name Rename a file or directory in the current location.
  • path is the path to the file system entry to be renamed.
  • name is the new name of the file system entry.
rm [-r, -R] path Remove files or directories.
  • -r, -R specifies to recursively remove the files or directories.
  • -v explain what is being done.
  • path is the path to the file system entry to be removed. The path may contain wildcards.
rmdir path Remove an empty directory or directories.
  • path is the path to the directory to be removed.
stat file_or_dir [-v] Display the file system entry status.
  • file_or_dir is the file system. The path may contain wildcards.
  • -v to print verbose detailed information about the file status.
truncate file Truncate a file to 0 bytes. Useful for retaining the metadata for the file.
  • file is the file to truncate.
umount [-f] locations Unmount file system storage locations.
  • -f to force unmounting, even if the location is unavailable.
  • locations is the UUID (Universal Unique Identifier) of UUIDs of the locations to unmount. Get the UUID from the df command.

Removing a DSEFS node

When removing a node running DSEFS from a DSE cluster, additional steps are needed to ensure proper correctness within the DSEFS data set.

  1. From a node in the same datacenter as the node to be removed, start the DSEFS shell.
    dse fs
  2. Show the current DSEFS nodes with the df command.
    dsefs > df
    Location                              Status  DC              Rack   Host               Address        Port  Directory            Used         Free    Reserved
    144e587c-11b1-4d74-80f7-dc5e0c744aca  up      GraphAnalytics  rack1  node1.example.com  10.200.179.38  5598  /var/lib/dsefs/data     0  29289783296  5368709120
    98ca0435-fb36-4344-b5b1-8d776d35c7d6  up      GraphAnalytics  rack1  node2.example.com  10.200.179.39  5598  /var/lib/dsefs/data     0  29302099968  5368709120
  3. Find the node to be removed in the list and note the UUID value for it under the Location column.
  4. If the node is up, unmount it from DSEFS with the command umount UUID.
  5. If the node is not up (for example, after a hardware failure), force unmount it from DSEFS with the command umount -f UUID.
  6. Continue with the normal steps for removing a node.

Examples

Using the DSEFS shell, these commands put the local bluefile to the remote DSEFS greenfile:
dsefs / >  ls -l 
dsefs / >  put file:/bluefile greenfile
To view the new file in the DSEFS directory:
dsefs / >  ls -l 
Type  Permission  Owner  Group  Length  Modified                  Name                        
file  rwxrwxrwx   none   none       17  2016-05-11 09:34:26+0000  greenfile  
Using the dse command, these commands create the test2 directory and upload the local README.md file to the new DSEFS directory.
dse fs "mkdir /test2" &&
dse fs "put README.md /test2/README.md"
To view the new directory listing:
dse fs "ls -l /test2"
Type Permission Owner Group Length Modified Name
file rwxrwxrwx none none 3382 2016-03-07 23:20:34+0000 README.md
You can use two or more dse commands in a single command line. This is faster because the JVM is launched and connected/disconnected with DSEFS only once. For example:
dse fs "mkdir / test2" "put README.md /test/README.md"
The following example shows how to use the --no-force-sync flag on a directory, and how to check the state of the --force-sync flag using state. These commands are run from within the DSEFS shell.
dsefs> mkdir --no-force-sync /tmp
dsefs> put file:some-file.dat /tmp/file.tmp
dsefs> stat /tmp/file.tmp
FILE dsefs://127.0.0.1:5598/tmp/file.tmp
Owner           none
Group           none
Permission      rwxrwxrwx
Created         2017-03-06 17:54:35+0100
Modified        2017-03-06 17:54:35+0100
Accessed        2017-03-06 17:54:35+0100
Size            1674
Block size      67108864
Redundancy      3
Compressed      false
Encrypted       false
Forces sync     false
Comment