DSE Analytics
Use DSE Analytics to analyze huge databases. DSE Analytics provides real-time, streaming, and batch analytics with built-in integration with Apache Spark™, a distributed, parallel data processing engine.
DSE Analytics features
- SparkR
-
DataStax Enterprise supports SparkR for R analytic processing.
- No single point of failure
-
DSE Analytics supports a peer-to-peer, distributed cluster for running Spark jobs. Being peers, any node in the cluster can load data files, and any analytics node can assume the responsibilities of Spark Master.
- Spark Master management
-
DSE Analytics provides automatic Spark Master management.
- Analytics without ETL
-
Using DSE Analytics, you run Spark jobs directly against data in the database. You can perform real-time and analytics workloads at the same time without one workload affecting the performance of the other. Starting some cluster nodes as Analytics nodes and others as pure transactional real-time nodes automatically replicates data between nodes.
- DataStax Enterprise file system (DSEFS)
-
DSEFS (DataStax Enterprise file system) is a fault-tolerant, general-purpose, distributed file system within DataStax Enterprise. It is designed for use cases that need to leverage a distributed file system for data ingestion, data staging, and state management for Spark Streaming applications (such as checkpointing or write-ahead logging). DSEFS is similar to HDFS, but avoids the deployment complexity and single point of failure typical of HDFS. DSEFS is HDFS-compatible and is designed to work in place of HDFS in Spark and other systems.
All analytics keyspaces are initially created with the SimpleStrategy
replication strategy and a replication factor (RF) of 1.
Each of these must be updated in production environments to avoid data loss.