About DataStax Enterprise

DataStax Enterprise is a big data platform built on Apache Cassandra that manages real-time, analytics, and enterprise search data. DataStax Enterprise leverages Cassandra, Apache Hadoop, Apache Spark, and Apache Solr to shift your focus from the data infrastructure to using your data strategically.

DataStax Enterprise is a big data platform built on Apache Cassandra that manages real-time, analytics, and enterprise search data. DataStax Enterprise leverages Cassandra, Apache Hadoop, and Apache Solr to shift your focus from the data infrastructure to using your data strategically, as described in the DataStax Enterprise overview.

Important: DataStax Enterprise 4.5 uses Cassandra 2.0.

New features

DataStax Enterprise 4.5 introduces Apache Spark 0.9.1 and Shark 0.9.1.1 integration for running performant analytical queries independent of Hadoop. Spark is a distributed, parallel, batch data processing engine based on the Resilient Distributed Datasets (RDD) concept instead of the MapReduce concept upon which Hadoop is based. Spark is typically faster than Hadoop.

Shark, a SQL-like, Hive-compatible language is built on top of Spark. The transition for users of Hive, which is the SQL-like language built on top of Hadoop, is painless.

A bring your own Hadoop (BYOH) model integrates Hadoop data warehouse implementations Cloudera and Hortonworks. This model can provide better performance through custom, better-tuned Hadoop than previous DataStax Enterprise versions.

DataStax Enterprise 4.5 improves integration of Apache Sqoop for importing RDBMS data and exporting Cassandra CQL data.

The About the Performance Service automatically collects and organizes performance diagnostic information into a set of data dictionary tables that can be queried with CQL. Use the information gathered in the diagnostic tables to examine your database metrics and improve the function of your clusters, queries, and nodes.

DataStax Enterprise 4.5 also introduces a number of DSE Hadoop, Hive, and Pig features discussed later in this document and the following Solr features.

For performance, you can now configure DSE Search/Solr to parallelize row reads.
DSE Search uses the faster doc values-based join system under certain circumstances.
DataStax Enterprise 4.5 and later moves the DSE per-segment filter cache off-heap by using native memory, hence reducing on-heap memory consumption and garbage collection overhead.
The new off-heap filter cache is enabled by default, but can be disabled at startup time.