About DSEFS
DSEFS (DataStax Enterprise file system) is a distributed file system within DataStax Enterprise.
DSEFS (DataStax Enterprise file system) is a fault-tolerant, general-purpose, distributed file system within DataStax Enterprise. It is designed for use cases that need to leverage a distributed file system for data ingestion, data staging, and state management for Spark Streaming applications (such as checkpointing or write-ahead logging). DSEFS is similar to HDFS, but avoids the deployment complexity and single point of failure typical of HDFS. DSEFS is HDFS-compatible and is designed to work in place of HDFS in Spark and other systems.
DSEFS is the default distributed file system in DataStax Enterprise, and is automatically enabled on all analytics nodes.
- Metadata is stored in the database.
- File data blocks are stored locally on each node and are replicated onto multiples
nodes.
The redundancy factor is set at the DSEFS directory or file level, which is more granular than the replication factor that is set at the keyspace level in the database.
Deployment overview
- The DSEFS server runs in the same JVM as DataStax Enterprise. Similar to the database, there is no master node. All nodes running DSEFS are equal.
- A single DSEFS cannot span multiple datacenters. To deploy DSEFS in multiple datacenters, you can create a separate instance of DSEFS for each datacenter.
- You can use different keyspaces to configure multiple DSEFS file systems in a single datacenter.
- For optimal performance, locate the local DSEFS data on a different physical drive than the database.
- Encryption is not supported. Use operating system access controls to protect the local DSEFS data directories. Other limitations apply.
- DSEFS uses the LOCAL_QUORUM consistency level to store file metadata. DSEFS will always try to write each data block to replicated node locations, and even if a write fails, it will retry to another node before acknowledging the write. DSEFS writes are very similar to the ALL consistency level, but with additional failover to provide high-availability. DSEFS reads are similar to the ONE consistency level.