About DataStax Enterprise
DataStax Enterprise is a big data platform built on Apache Cassandra that manages real-time, analytics, and enterprise search data. DataStax Enterprise leverages Cassandra, Apache Hadoop, Apache Spark, and Apache Solr to shift your focus from the data infrastructure to using your data strategically.
See the DataStax Upgrade Guide.
DataStax Enterprise installation methods include GUI or text mode, unattended command line or properties file, YUM and APT repository, and binary tarball.
Installer - GUI or Text mode
DataStax Enterprise production installation or upgrade on any Linux-based platform using a graphical or text interface.
Installer - unattended
Installs DataStax Enterprise using the command line or properties file.
Other install methods
Install using YUM or APT packages or binary tarball.
On cloud providers
Install on Amazon EC2, CenturyLink, GoGrid, HP cloud, or Microsoft Azure.
Installing EPEL on RHEL OS 5.x
Install Extra Packages for Enterprise Linux on RHEL OS 5.x.
Installing prior releases
Steps for installing the same version as other nodes in your cluster.
Uninstalling DataStax Enterprise
Uninstalling DataStax Enterprise and DataStax Agent.
Managing security in DataStax Enterprise including authentication, encryption, auditing, permissions, and configuration.
DataStax Enterprise includes advanced data protection for enterprise-grade databases including LDAP authentication support, internal authentication, object permissions, encryption, Kerberos authentication, and data auditing.
Authenticating with Kerberos
An overview of Kerberos in DataStax Enterprise and recommendations.
Client-to-node encryption protects data in flight from client machines to a database cluster.
Node-to-node encryption protects data that is transferred between nodes in a cluster using SSL (Secure Sockets Layer).
Generate SSL certificates for client-to-node encryptions or node-to-node encryption.
Sample files for Kerberos, SSL, and Kerboros and SSL.
Transparent data encryption
Transparent data encryption (TDE) protects at rest data. TDE requires a secure local file system to be effective.
Auditing is implemented as a log4j-based integration.
Internal authentication is based on Cassandra-controlled login accounts and passwords. You can authenticate uses of Hadoop tools, Spark-to-Cassandra connections, and Shark configuration changes.
Managing object permissions
Use GRANT/REVOKE to grant or revoke permissions to access Cassandra data.
Configuring keyspace replication
The system_auth and dse_security keyspaces store security authentication and authorization information.
Configuring firewall ports
Opening the required ports to allow communication between the nodes.
Using the in-memory option
DataStax Enterprise includes the in-memory option for storing data to and accessing data from memory exclusively.
DataStax Enterprise analytics includes integration with Apache Spark, Apache Shark, BYOH (bring your own Hadoop), and DSE Hadoop.
Introduction to DSE Analytics
DataStax Enterprise serves the analytics market with significant features for analyzing huge databases.
Analyzing data using Spark
Spark is the default mode when you start an analytics node in a packaged installation. Spark runs locally on each node.
Analyzing data using external Hadoop systems (BYOH)
DataStax Enterprise (DSE) works with external Hadoop systems in a bring your own Hadoop (BYOH) model. Use BYOH when you want to run DSE with a separate Hadoop cluster, from a different vendor.
Getting Started with Solr
DataStax Enterprise supports Open Source Solr (OSS) tools and APIs, simplifying migration from Solr to DataStax Enterprise.
Supported and unsupported features
Supported and unsupported DSE Search and Solr features.
Defining key Solr terms
Solr terms include several names for an index of documents and configuration on a single node.
Installing Solr nodes
Installing and starting and stopping Solr nodes.
Steps for setting up Cassandra and Solr for the tutorial.
Configure Solr Type mapping.
Creating an index for searching
Requirements and steps for creating a Solr index.
Using DSE Search/Solr
A brief description and illustration of DSE Search.
Querying Solr data
A brief description about query Solr data.
Use a discovery process to develop a plan to ensure sufficient memory resources.
About using real-time (Cassandra), integrated Hadoop or Spark/Shark (Analytics), an external Hadoop system, or search (Solr) nodes in the same cluster.
Topics for using DSE Search.
Tuning DSE Search performance
Topics for performance tuning and solving performance degradation, high memory consumption, or other problem swith DataStax Enterprise Search nodes.
The solr/NativeAllocatorStats MBean exposes native memory allocation.
Update request processor and field transformer
Use the custom update request processor (URP) to extend the Solr URP. Use the field input/output transformer API as an option to the input/output transformer support in OS Solr.
DSE vs. Open source
A comparison of DSE Search and Open Source Solr.
Run the Wikipedia demo on a single node to download Wikipedia articles, create a CQL table, store the articles, and index the articles in Solr.
Migrating data using Sqoop
Migrating data using Sqoop topics.
Migrating data using other methods
Migrating data to DataStax Enterprise solutions include the COPY command, the DSE Search/Solr Data Import Handler, and the Cassandra bulk loader.
Production deployment planning
Production deployment planning requires knowledge of the initial volume of data to store and an estimate of the typical application workload.
Choose a data partitioner and replica placement strategy.
Organize nodes that run different workloads into virtual data centers. Put analytic nodes in one data center, search nodes in another, and Cassandra real-time transactional nodes in another data center.
Single data center deployment per workload type
A deployment scenario with a mixed workload cluster has only one data center for each type of workload.
Multiple data center deployment per workload type
A deployment scenario with a mixed workload cluster has more than one data center for each type of node.
Single-token architecture deployment
Use single-token architecture deployment when you are not using virtual nodes (vnodes).
Tokens assign a range of data to a particular node within a data center.
Expanding an AMI cluster
To expand your EC2 implementations, use OpsCenter to provision a new cluster, add a new cluster, or add nodes to a cluster.
DataStax Management Services
DataStax Management Services automatically handle administration and maintenance tasks and assist with overall database cluster management.
Performance Service topics.
The Capacity Service automatically collects data about a cluster’s operations and provides for the ability to do historical trend analysis and forecasting of future trends.
The Repair Service is designed to automatically keep data synchronized across a cluster and can be managed either visually through OpsCenter or via the command line.
DataStax Enterprise tools
Tools include dse commands, dsetool, dfs-stress tool, pre-flight check, yaml_diff, and the Cassandra bulk loader.
The dse commands
Table of dse commands for using DataStax Enterprise
Use the dsetool utility for Cassandra File System (CFS) and Hadoop-related tasks, such as managing the job tracker, checking the CFS, and listing node subranges of data in a keyspace.
The cfs-stress tool
Performs stress testing of the Cassandra File System (CFS) layer.
Pre-flight check and yaml_diff tools
The pre-flight check tool is available for packaged installations. This collection of tests can be run on a node to detect and fix a configuration. The yaml_diff tool filters differences between cassandra.yaml files.
The configuration file for Kerberos authentication, purging of expired data from the Solr indexes, setting Solr inter-node communication, adjusting disk health intervals, and enabling the Performance Service.
Starting and stopping DSE
Starting and stopping DataStax Enterprise as a service or stand-alone process.
File locations: Installer-Services and Package
Locations when installing from the DataStax Installer with the Services option or package installations.
File locations: Installer-No Services and Tarball
Locations when installing from the DataStax Installer with No Services selected or tarball installations.
Troubleshooting examples are useful to discover and resolve problems with DSE. Also check the Cassandra troubleshooting documentation.
Cassandra Log4j appender
DataStax Enterprise allows you to stream your web and application log information into a database cluster via Apache log4j.
Installing glibc on Oracle Linux
To install DSE on Oracle Enterprise Linux 6.x and later, install the 32-bit versions of the glibc libraries.
DataStax Enterprise release notes cover components, changes and enhancements, issues, and resolved issues for DataStax Enterprise 4.5x.