Apache Cassandra™ 2.2 (Not supported)

About Apache Cassandra

Documentation for developers and administrators on installing, configuring, and using the features and capabilities of Apache Cassandra scalable open source NoSQL database.

What's new in Cassandra 2.2

An overview of new features in Cassandra.

Understanding the architecture

Important topics for understanding Cassandra.

Architecture in brief

Essential information for understanding and using Cassandra.

Internode communications (gossip)

Cassandra uses a protocol called gossip to discover location and state information about the other nodes participating in a Cassandra cluster.

Data distribution and replication

How data is distributed and factors influencing replication.

Partitioners

A partitioner determines how data is distributed across the nodes in the cluster (including replicas).

Snitches

A snitch determines which data centers and racks nodes belong to.

Database internals

Topics about the Cassandra database.

Storage engine

A description about Cassandra's storage structure and engine.

How Cassandra reads and writes data

Understanding how Cassandra stores data.

Data consistency

Topics about how up-to-date and synchronized a row of data is on all replicas.

Planning a cluster deployment

Vital information about successfully deploying a Cassandra cluster.

Selecting hardware for enterprise implementations

Choosing appropriate hardware depends on selecting the right balance of the following resources: memory, CPU, disks, number of nodes, and network.

Planning an Amazon EC2 cluster

Important information for deploying a production Cassandra cluster on Amazon EC2.

Calculating partition size

Determining how much data your Cassandra partitions can hold.

Calculating usable disk capacity

Determining how much data your Cassandra nodes can hold.

Calculating user data size

Accounting for storage overhead in determining user data size.

Anti-patterns in Cassandra

Implementation or design patterns that are ineffective and/or counterproductive in Cassandra production installations. Correct patterns are suggested in most cases.

Installing

Various installation methods.

Installing the RHEL-based packages

Install using Yum repositories on RHEL, CentOS, and Oracle Linux.

Installing the Debian and Ubuntu packages

Install using APT repositories on Debian and Ubuntu.

Installing from the binary tarball

Install on all Linux-based platforms using a binary tarball.

Installing prior releases of DataStax Community

Steps for installing the same version as other nodes in your cluster.

Uninstalling DataStax Community from Linux

Steps for uninstalling Cassandra by installation type.

Installing on cloud providers

Installing Cassandra on cloud providers.

Installing the Oracle JDK

Instructions for various platforms.

Recommendations for production environments.

Install locations

Install location topics.

Configuration

Configuration topics.

cassandra.yaml configuration file

The cassandra.yaml file is the main configuration file for Cassandra.

Cassandra include file

Set environment variables (cassandra.in.sh).

Security

Topics for securing Cassandra.

Configuring gossip settings

Using the cassandra.yaml file to configure gossip.

Configuring the heap dump directory

Analyzing the heap dump file can help troubleshoot memory problems.

Configuring virtual nodes

Topics about configuring virtual nodes.

Using multiple network interfaces

Steps for configuring Cassandra for multiple network interfaces or when using different regions in cloud implementations.

Configuring logging

Cassandra logging functionality using Simple Logging Facade for Java (SLF4J) with a logback backend.

Commit log archive configuration

Cassandra provides commit log archiving and point-in-time recovery.

Generating tokens

If not using virtual nodes (vnodes), you must calculate tokens for your cluster.

Hadoop support

Cassandra support for integrating Hadoop with Cassandra.

Initializing a cluster

Topics for deploying a cluster.

Initializing a multiple node cluster (single data center)

A deployment scenario for a Cassandra cluster with a single data center.

Initializing a multiple node cluster (multiple data centers)

A deployment scenario for a Cassandra cluster with multiple data centers.

Starting and stopping Cassandra

Topics for starting and stopping Cassandra.

Clearing the data for an AMI restart

Clearing the data for an Amazon Machine Image restart.

Operations

Cassandra operation topics.

Adding or removing nodes, data centers, or clusters

Topics for adding or removing nodes, data centers, or clusters.

Backing up and restoring data

Cassandra backs up data by taking a snapshot of all on-disk data files (SSTable files) stored in the data directory.

Repairing nodes

Node repair topics.

Monitoring Cassandra

Monitoring topics.

Tuning Java resources

Tuning the Java Virtual Machine (JVM) can improve performance or reduce high memory consumption.

Data caching

Data caching topics.

Configuring memtable throughput

Configuring memtable throughput to improve write performance.

Configuring compaction

Steps for configuring compaction. The compaction process merges keys, combines columns, evicts tombstones, consolidates SSTables, and creates a new index in the merged SSTable.

Compression

Compression maximizes the storage capacity of Cassandra nodes by reducing the volume of data on disk and disk I/O, particularly for read-dominated workloads.

Testing compaction and compression

Enabling write survey mode.

Tuning Bloom filters

Cassandra uses Bloom filters to determine whether an SSTable has data for a particular row.

Moving data to or from other databases

Solutions for migrating from other databases.

Purging gossip state on a node

Correcting a problem in the gossip state.

Cassandra tools

Topics for Cassandra tools.

The nodetool utility

A command line interface for managing a cluster.

The cassandra utility

Cassandra start-up parameters can be run from the command line (in Tarball installations) or specified in the cassandra-env.sh file (Package or Tarball installations).

The cassandra-stress tool

A Java-based stress testing utility for basic benchmarking and load testing a Cassandra cluster.

SSTable utilities

Troubleshooting

Peculiar Linux kernel performance problem on NUMA systems

Problems due to zone_reclaim_mode.

Nodes appear unresponsive due to a Linux futex_wait() kernel bug

Nodes randomly freeze and become unresponsive for an unknown reason.

Reads are getting slower while writes are still fast

The cluster's IO capacity is not enough to handle the write load it is receiving.

Nodes seem to freeze after some period of time

Some portion of the JVM is being swapped out by the operating system (OS).