DSEFS (DataStax Enterprise file system) is a fault-tolerant, general-purpose, distributed file system within DataStax Enterprise. It is designed for use cases that need to leverage a distributed file system for data ingestion, data staging, and state management for Spark Streaming applications (such as checkpointing or write-ahead logging). DSEFS is similar to HDFS, but avoids the deployment complexity and single point of failure typical of HDFS. DSEFS is HDFS-compatible and is designed to work in place of HDFS in Spark and other systems.
DSEFS is the default distributed file system in DataStax Enterprise, and is automatically enabled on all analytics nodes.
DSEFS stores file metadata (such as file path, ownership, permissions) and file contents separately:
Metadata is stored in the database.
File data blocks are stored locally on each node and are replicated onto multiples nodes.
The redundancy factor is set at the DSEFS directory or file level, which is more granular than the replication factor that is set at the keyspace level in the database.
For performance on production clusters, store the DSEFS data on physical devices that are separate from the database. For development and testing you may store DSEFS data on the same physical device as the database.
The DSEFS server runs in the same JVM as DataStax Enterprise. Similar to the database, there is no master node. All nodes running DSEFS are equal.
A single DSEFS cannot span multiple datacenters. To deploy DSEFS in multiple datacenters, you can create a separate instance of DSEFS for each datacenter.
You can use different keyspaces to configure multiple DSEFS file systems in a single datacenter.
For optimal performance, locate the local DSEFS data on a different physical drive than the database.
Encryption is not supported. Use operating system access controls to protect the local DSEFS data directories. Other limitations apply.
DSEFS uses the
LOCAL_QUORUMconsistency level to store file metadata. DSEFS writes each data block to replicated node locations. Even if a write fails, it retries to another node before acknowledging the write. DSEFS writes are very similar to the
ALLconsistency level, but with additional failover to provide high-availability. DSEFS reads are similar to the ONE consistency level.