DSEFS (DataStax Enterprise file system) is a fault-tolerant, general-purpose, distributed file system within DataStax Enterprise. It is designed for use cases that need to leverage a distributed file system for data ingestion, data staging, and state management for Spark Streaming applications (such as checkpointing or write-ahead logging). DSEFS is similar to HDFS, but avoids the deployment complexity and single point of failure typical of HDFS. DSEFS is HDFS-compatible and is designed to work in place of HDFS in Spark and other systems.

DSEFS is the default distributed file system in DataStax Enterprise, and is automatically enabled on all analytics nodes.

DSEFS stores file metadata (such as filepath, ownership, permissions) and file contents separately:

  • Metadata is stored in the database.

  • File data blocks are stored locally on each node and are replicated onto multiples nodes.

    The redundancy factor is set at the DSEFS directory or file level, which is more granular than the replication factor that is set at the keyspace level in the database.

For performance on production clusters, store the DSEFS data on physical devices that are separate from the database. For development and testing you may store DSEFS data on the same physical device as the database.

Deployment overview

  • The DSEFS server runs in the same JVM as DataStax Enterprise. Similar to the database, there is no master node. All nodes running DSEFS are equal.

  • A single DSEFS cannot span multiple datacenters. To deploy DSEFS in multiple datacenters, you can create a separate instance of DSEFS for each datacenter.

  • You can use different keyspaces to configure multiple DSEFS file systems in a single datacenter.

  • For optimal performance, locate the local DSEFS data on a different physical drive than the database.

  • Encryption is not supported. Use operating system access controls to protect the local DSEFS data directories. Other limitations apply.

  • DSEFS uses the LOCAL_QUORUM consistency level to store file metadata. DSEFS will always try to write each data block to replicated node locations, and even if a write fails, it will retry to another node before acknowledging the write. DSEFS writes are very similar to the ALL consistency level, but with additional failover to provide high-availability. DSEFS reads are similar to the ONE consistency level.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com