Using the cfs-archive to store huge files

The Cassandra File System (CFS) consists of two layers: cfs and cfs-archive. Using cfs-archive is recommended for long-term storage of huge files.

The Cassandra File System (CFS) consists of two layers, cfs and cfs-archive, that you access using these Hadoop shell commands and URIs:

  • cfs:// for the cassandra layer
  • cfs-archive:// for the cassandra archive layer
Note: You can create other CFS with different names. File systems with the -archive suffix are archive file systems.

Using cfs-archive is highly recommended for long-term storage of huge files, including files with terabytes of data. On the contrary, using cfs is not recommended because the data on this layer undergoes the compaction process periodically, as it should. Hadoop uses the cfs layer for many small files and temporary data, which need to be cleaned up after deletions occur. When you use the cfs layer instead of the cfs-archive layer, compaction of huge files can take too long, for example, days. Files stored on the cfs-archive layer, on the other hand, do not undergo compaction automatically. You can manually start compaction using the nodetool compact command.

Example: Store a file on cfs-archive 

This example shows how to store a file on cfs-archive using the Hadoop shell commands from the DataStax Enterprise installation directory on Linux:

  1. Create a directory on the cfs-archive layer. You must use an additional forward slash, as described earlier:
    bin/dse hadoop fs -mkdir cfs-archive:///20140401
  2. Use the Hadoop shell put command with an absolute path name to store the file on the cfs-archive layer.
    bin/dse hadoop fs -put big_archive.csv cfs-archive:///20140401/big_archive.csv
  3. Verify that the file is stored in the cfs-archive.
    bin/dse hadoop fs -ls cfs-archive:///20140401/

Example: Migrate a file from SQL to text on cfs-archive 

This example shows how to migrate the data from the MySQL table the archive directory cfs-archive/npa_nxx.

  1. Run the sqoop demo.
  2. Use the dse command in the bin directory to migrate the data from the MySQL table to text files in the npa_nxx directory of cfs-archive. Specify the IP address of the host in the --target-dir option.
    sudo ./dse sqoop import --connect
        jdbc:mysql://127.0.0.1/npa_nxx_demo
        --username root
        --password password
        --table npa_nxx
        --target-dir cfs-archive://127.0.0.1/npa_nxx