DataStax Enterprise 3.2 (EOSL)

About DataStax Enterprise

DataStax Enterprise is a big data platform built on Apache Cassandra that manages real-time, analytics, and enterprise search data. DataStax Enterprise leverages Cassandra, Apache Hadoop, and Apache Solr to shift your focus from the data infrastructure to using your data strategically.

Upgrading

See the DataStax Upgrade Guide.

Installing

DataStax Enterprise installation methods include GUI or text mode, unattended command line or properties file, YUM and APT repository, and binary tarball.

Installing on RHEL-based systems

Install DataStax Enterprise and OpsCenter using Yum repositories on RHEL-based systems.

Installing on Debian-based systems

Install DataStax Enterprise and OpsCenter using APT repositories on Debian-based systems.

Installing the binary tarball

Install DataStax Enterprise on any Linux-based platform, including 32-bit platforms.

Installing on SUSE

DataStax provides a binary tarball distribution for installing DataStax Enterprise on SUSE Linux.

On cloud providers

Install on Amazon EC2 or HP cloud.

Installing prior releases

Steps for installing the same version as other nodes in your cluster.

Security

Managing security in DataStax Enterprise including authentication, encryption, auditing, permissions, and configuration.

Security management

DataStax Enterprise includes advanced data protection for enterprise-grade databases including internal authentication, object permissions, encryption, Kerberos authentication, and data auditing.

Authenticating with Kerberos

DataStax Enterprise authentication with Kerberos protocol uses tickets to prove identity for nodes that communicate over non-secure networks.

Client-to-node encryption

Client-to-node encryption protects data in flight from client machines to a database cluster.

Node-to-node encryption

Node-to-node encryption protects data that is transferred between nodes in a cluster using SSL (Secure Sockets Layer).

Server certificates

Generate SSL certificates for client-to-node encryptions or node-to-node encryption.

Installing cqlsh security

Install packages to use cqlsh with a Kerberized cluster.

Transparent data encryption

Transparent data encryption (TDE) protects at rest data. TDE requires a secure local file system to be effective.

Data auditing

Auditing is implemented as a log4j-based integration.

Internal authentication

Internal authentication is based on Cassandra-controlled login accounts and passwords.

Managing object permissions

Use GRANT/REVOKE to grant or revoke permissions to access Cassandra data.

Configuring keyspace replication

The system_auth and dse_security keyspaces store security authentication and authorization information.

Configuring firewall ports

Opening the required ports to allow communication between the nodes.

DSE Analytics with Hadoop

DSE Hadoop topics.

Getting started

The Hadoop component in DataStax Enterprise enables analytics to be run across DataStax Enterprise's distributed, shared-nothing architecture. Instead of using the Hadoop Distributed File System (HDFS), DataStax Enterprise uses Cassandra File System (CFS) keyspaces for the underlying storage layer.

Using the job tracker node

DataStax Enterprise schedules a series of tasks on the analytics nodes for each MapReduce job that is submitted to the job tracker.

About the Cassandra File System

A Hive or Pig analytics job requires a Hadoop file system to function. For use with DSE Hadoop, DataStax Enterprise provides a replacement for the Hadoop Distributed File System (HDFS) called the Cassandra File System (CFS).

Using the cfs-archive to store huge files

The Cassandra File System (CFS) consists of two layers: cfs and cfs-archive. Using cfs-archive is recommended for long-term storage of huge files.

Using Hive

DataStax Enterprise includes a Cassandra-enabled Hive MapReduce client.

Using the DataStax ODBC driver for Hive on Windows

The DataStax ODBC Driver for Hive provides Windows users access to the information that is stored in DSE Hadoop.

Using Mahout

DataStax Enterprise integrates Apache Mahout, a Hadoop component that offers machine learning libraries.

Using Pig

DataStax Enterprise includes a Cassandra File System (CFS) enabled Apache Pig Client to provide a high-level programming environment for MapReduce coding.

Using Sqoop

Sqoop is an Apache Software Foundation tool for transferring data between an RDBMS data source and Hadoop or between other data sources, such as NoSQL.

DSE Search with Solr

DSE Search topics.

Getting Started with Solr

DataStax Enterprise supports Open Source Solr (OSS) tools and APIs, simplifying migration from Solr to DataStax Enterprise.

Solr support for CQL 3

Supported and unsupported DSE Search and Solr features.

Defining key Solr terms

Solr terms include several names for an index of documents and configuration on a single node.

Installing Solr nodes

Installing and starting Solr nodes.

Solr tutorial

Steps for setting up Cassandra and Solr for the tutorial.

Configuring Solr

Configure Solr Type mapping.

Creating an index for searching

Requirements and steps for creating a Solr index.

Using DSE Search/Solr

A brief description and illustration of DSE Search.

Querying search results

A brief description about query Solr data.

Capacity planning

Use a discovery process to develop a plan to ensure sufficient memory resources.

Mixing workloads

About using real-time (Cassandra), Hadoop, or search (Solr) nodes in the same cluster.

Common operations

Topics for using DSE Search.

Tuning DSE Search performance

Topics for performance tuning and solving performance degradation, high memory consumption, or other problem swith DataStax Enterprise Search nodes.

Transforming data

Use the field input/output transformer API to the input/output transformer support in OS Solr.

DSE vs. Open source

A comparison of DSE Search and Open Source Solr.

Deploying

Deployment topics.

Production deployment planning

Production deployment planning requires knowledge of the initial volume of data to store and an estimate of the typical application workload.

Configuring replication

Choose a data partitioner and replica placement strategy.

Single data center deployment

A deployment scenario with a mixed workload cluster has only one data center for each type of workload.

Multiple data center deployment

A deployment scenario with a mixed workload cluster has more than one data center for each type of node.

Expanding an AMI cluster

To expand your EC2 implementations, use OpsCenter to provision a new cluster, add a new cluster, or add nodes to a cluster.

Moving data to/from other databases

DataStax offers several solutions for migrating from other databases.

Reference

Reference topics.

Analytics tools: dse commands and dsetool

Options for staring DataStax Enterprise.

Installing glibc on Oracle Linux

To install DSE on Oracle Enterprise Linux 6.x and later, install the 32-bit versions of the glibc libraries.

Tarball file locations

Locations when DataStax Enterprise was installed from a tarball.

Package file locations

Locations when DataStax Enterprise was installed from a package.

Configuration (dse.yaml)

The configuration file for Kerberos authentication, purging of expired data from the Solr indexes, and setting Solr inter-node communication.

Starting and stopping DSE

Starting and stopping DataStax Enterprise as a service or stand-alone process.

Pre-flight check tool

The pre-flight check tool detecta and fix configuration problems. The yaml_diff tool checks for differences between two cassandra.yaml files.

Troubleshooting

Troubleshooting examples are useful to discover and resolve problems with DSE. Also check the Cassandra troubleshooting documentation.

Cassandra Log4j appender

DataStax Enterprise allows you to stream your web and application log information into a database cluster via Apache log4j.

Release notes

Release notes for DataStax Enterprise 3.2.x.