DataStax Enterprise 4.6 EOSL

About DataStax Enterprise

DataStax Enterprise is a big data platform built on Apache Cassandra that manages real-time, analytics, and enterprise search data. DataStax Enterprise leverages Cassandra, Apache Hadoop, Apache Spark, and Apache Solr to shift your focus from the data infrastructure to using your data strategically.

Upgrading

See the DataStax Upgrade Guide.

Installing

DataStax Enterprise installation methods include GUI or text mode, unattended command line or properties file, YUM and APT repository, and binary tarball.

Installer - GUI/Text mode

DataStax Enterprise production installation or upgrade on any Linux-based platform using a graphical or text interface.

Installer - unattended

Install DataStax Enterprise using the command line or properties file.

Other install methods

Installation using YUM or APT packages or binary tarball.

On cloud providers

Installation on Amazon EC2, CenturyLink, GoGrid, HP cloud, or Microsoft Azure.

Installing EPEL on RHEL OS 5.x

Install Extra Packages for Enterprise Linux on RHEL OS 5.x.

Uninstalling DataStax Enterprise 4.6

Launch the uninstaller in the installation directory to uninstall DataStax Enterprise and DataStax Agent.

Managing security

Managing security in DataStax Enterprise including authentication, encryption, auditing, permissions, and configuration.

Security management

DataStax Enterprise includes advanced data protection for enterprise-grade databases including LDAP authentication support, internal authentication, object permissions, encryption, Kerberos authentication, and data auditing.

Authenticating with Kerberos

DataStax Enterprise authentication with Kerberos protocol uses tickets to prove identity for nodes that communicate over non-secure networks.

Authenticating a cluster with LDAP

DataStax Enterprise supports LDAP authentication support for external LDAP services.

Client-to-node encryption

Client-to-node encryption protects data in flight from client machines to a database cluster. It establishes a secure channel between the client and the coordinator node.

Node-to-node encryption

Node-to-node encryption protects data that is transferred between nodes in a cluster using SSL (Secure Sockets Layer).

Spark SSL encryption

Communication between Spark clients and clusters as well as communication between Spark nodes can be encrypted using SSL.

Server certificates

All nodes requires relevant SSL certificates. Generate SSL certificates for client-to-node encryptions or node-to-node encryption.

Running cqlsh

Sample files are provided to help configure authentication for Kerberos, SSL, and Kerberos and SSL.

Transparent data encryption

Transparent data encryption (TDE) protects at rest data. TDE requires a secure local file system to be effective.

Data auditing

Set options in dse.yaml for configuring and using data auditing.

Internal authentication

Internal authentication is based on Cassandra-controlled login accounts and passwords.

Managing object permissions

Use GRANT/REVOKE to grant or revoke permissions to access Cassandra data.

Configuring keyspace replication

The system_auth and dse_security keyspaces store security authentication and authorization information.

Configuring firewall ports

If a firewall runs on the nodes in the Cassandra or DataStax Enterprise cluster, open up ports to allow communication between the nodes.

Using the in-memory option

DataStax Enterprise includes the in-memory option for storing data to and accessing data from memory exclusively.

DSE Analytics

DataStax Enterprise analytics includes integration with Apache Spark, Apache Shark, BYOH (bring your own Hadoop), and DSE Hadoop.

Introduction to DSE Analytics

DataStax Enterprise serves the analytics market with significant features for analyzing huge databases.

Analyzing data using Spark

Spark is the default mode when you start an analytics node in a packaged installation. Spark runs locally on each node.

Analyzing data using external Hadoop systems (BYOH)

DataStax Enterprise (DSE) works with external Hadoop systems in a bring your own Hadoop (BYOH) model. Use BYOH when you want to run DSE with a separate Hadoop cluster, from a different vendor.

Analyzing data using DSE Hadoop

You can run analytics on Cassandra data using Hadoop that is integrated into DataStax Enterprise. The Hadoop component in DataStax Enterprise enables analytics to be run across DataStax Enterprise distributed, shared-nothing architecture.

DSE Search features

DSE Search provides enterprise search using Apache Solr in DataStax Enterprise. DSE Search enhancements include dsetool commands.

Getting Started with Solr

DataStax Enterprise supports Open Source Solr (OSS) tools and APIs, simplifying migration from Solr to DataStax Enterprise.

Supported and unsupported features

Supported and unsupported Cassandra and DSE Search features.

Defining key Solr terms

Solr terms include several names for an index of documents and configuration on a single node.

DSE vs. Open source

Differences between DSE Search/Solr and Open Source Solr (OSS).

Architecture

A brief overview and illustration of the DSE search architecture.

Configuration

Configuration and operations of the DSE Search include starting and stopping DSE Search, segregating workloads, indexing, querying, and creating a schema and data modeling.

Querying

A brief description and illustration of querying nodes with DSE search.

Operations

Topics for using and running Solr nodes.

Performance tuning

Topics for performance tuning and solving performance degradation, high memory consumption, or other problem swith DataStax Enterprise Search nodes.

Update request processor and field transformer

Use the custom update request processor (URP) to extend the Solr URP. Use the field input/output transformer API as an option to the input/output transformer support in OS Solr.

Tutorial: Basics

Setting up for the DSE Search tutorial includes creating a Cassandra node, importing data, and creating resources.

Tutorial: Advanced

This DSE Search tutorial builds on the basic tutorial.

Troubleshooting

Take appropriate action to troubleshoot inconsistent query results, tracing Solr HTTP requests, and using several Mbeans.

Wikipedia demo

Run the Wikipedia demo on a single node to download Wikipedia articles, create a CQL table, store the articles, and index the articles in Solr.

Migrating data

Migrate data using Sqoop or other methods.

Migrating data using Sqoop

For DSE Hadoop, use Sqoop to transfer data between an RDBMS data source and Hadoop or between other data sources, such as NoSQL.

Migrating data using other methods

Migrating data to DataStax Enterprise solutions include the COPY command, the DSE Search/Solr Data Import Handler, and the Cassandra bulk loader.

Deploying

Production deployment of DataStax Enterprise includes planning, configuration, and choosing how the data is divided across the nodes in the cluster.

Production deployment planning

Resources for deployment planning and recommendations for deployment.

Configuring replication

How to set up DataStax Enterprise to store multiple copies of data on multiple nodes for reliability and fault tolerance.

Mixing workloads

Organize nodes that run different workloads into virtual data centers. Put analytic nodes in one data center, search nodes in another, and Cassandra real-time transactional nodes in another data center.

Single data center deployment per workload type

Steps for configuring nodes in a deployment scenario in a mixed workload cluster that has only one data center for each type of workload.

Multiple data center deployment per workload type

Steps for configuring nodes in a deployment scenario in a mixed workload cluster that has more than one data center for each type of node.

Single-token architecture deployment

Steps for deploying when not using virtual nodes (vnodes).

Calculating tokens

How to calculate tokens when using single-token architecture.

Expanding an AMI cluster

To expand your EC2 implementations, use OpsCenter to provision a new cluster, add a new cluster, or add nodes to a cluster.

DataStax Management Services

DataStax Management Services automatically handle administration and maintenance tasks and assist with overall database cluster management.

Performance Service

The Performance Service automatically collects and organizes performance diagnostic information into a set of data dictionary tables that can be queried with CQL.

Capacity Service

Automatically collects data about a cluster's operations, including Cassandra specific and platform specific (for example, disk metrics, network metrics), at both the node and column-family level (where applicable). Use OpsCenter to manage and perform trend analysis.

Repair Service

Designed to automatically keep data synchronized across a cluster, you can manage the Repair Service with OpsCenter or by using the command line.

DataStax Enterprise tools

Tools include dse commands, dsetool, dfs-stress tool, pre-flight check and yaml_diff tools, and the Cassandra bulk loader.

The dse commands

Table of common dse commands for using DataStax Enterprise.

The dsetool

Use the dsetool utility for Cassandra File System (CFS) and Hadoop-related tasks, such as managing the job tracker, checking the CFS, and listing node subranges of data in a keyspace.

The cfs-stress tool

Performs stress testing of the Cassandra File System (CFS) layer.

Pre-flight check and yaml_diff tools

The pre-flight check tool is available for packaged installations. This collection of tests can be run on a node to detect and fix a configuration. The yaml_diff tool filters differences between cassandra.yaml files.

Using the Cassandra bulk loader in a secure environment

Using sstableloader tool with Kerberos/SSL.

Common tasks

Common tasks for using DataStax Enterprise.

Configuration (dse.yaml)

The configuration file for DataStax Enterprise.

Starting and stopping DataStax Enterprise

You can start and stop DataStax Enterprise as a service or stand-alone process.

File locations: Installer-Services and Package

Locations when installing DataStax Enterprise from the DataStax Installer with Services option or from a package installation.

File locations: Installer-No Services and Tarball

Locations when installing from the DataStax All-in-One Installer with No Services selected or tarball installations.

Troubleshooting

Troubleshooting examples are useful to discover and resolve problems with DSE. Also check the Cassandra troubleshooting documentation.

Cassandra Log4j appender

DataStax Enterprise allows you to stream your web and application log information into a database cluster via Apache log4j.

Installing glibc on Oracle Linux

To install DSE on Oracle Enterprise Linux 6.x and later, install the 32-bit versions of the glibc libraries.

Release Notes

DataStax Enterprise release notes cover components, changes and enhancements, issues, and resolved issues for DataStax Enterprise 4.6.x releases.

Command commons