Apache Cassandra™ 2.0 (Not supported)

About Apache Cassandra

Documentation for developers and administrators on installing, configuring, and using the features and capabilities of Apache Cassandra scalable open source NoSQL database.

What's new in Cassandra

An overview of new features in Cassandra.

CQL

Cassandra Query Language (CQL) is the default and primary interface into the Cassandra DBMS.

Understanding the architecture

Important topics for understanding Cassandra.

Architecture in brief

Essential information for understanding and using Cassandra.

Internode communications (gossip)

Cassandra uses a protocol called gossip to discover location and state information about the other nodes participating in a Cassandra cluster.

Data distribution and replication

How data is distributed and factors influencing replication.

Partitioners

A partitioner determines how data is distributed across the nodes in the cluster (including replicas).

Snitches

A snitch determines which data centers and racks nodes belong to.

Client requests

Client read or write requests can be sent to any node in the cluster because all nodes in Cassandra are peers.

Planning a cluster deployment

Vital information about successfully deploying a Cassandra cluster.

Installing

Various installation methods.

Installing the RHEL-based packages

Install using Yum repositories on RHEL, CentOS, and Oracle Linux.

Installing the Debian and Ubuntu packages

Install using APT repositories on Debian and Ubuntu.

Installing from the binary tarball

Install on all Linux-based platforms using a binary tarball.

Installing prior releases of DataStax Community

Steps for installing the same version as other nodes in your cluster.

Uninstalling DataStax Community

Steps for uninstalling Cassandra by install type.

Installing on cloud providers

Installation methods for the supported cloud providers.

Installing the Oracle JDK and the JNA

Instructions for various platforms.

Recommended production settings

Recommendations for production environments.

Initializing a cluster

Topics for deploying a cluster.

Initializing a multiple node cluster (single data center)

A deployment scenario for a Cassandra cluster with a single data center.

Initializing a multiple node cluster (multiple data centers)

A deployment scenario for a Cassandra cluster with multiple data centers.

Security

Topics for securing Cassandra.

Securing Cassandra

Cassandra provides these security features to the open source community.

SSL encryption

Topics for using SSL in Cassandra.

Internal authentication

Topics for internal authentication.

Internal authorization

Topics about internal authorization.

Configuring firewall port access

Which ports to open when nodes are protected by a firewall.

Enabling JMX authentication

The default settings for Cassandra make JMX accessible only from localhost. To enable remote JMX connections, change the LOCAL_JMX setting in cassandra-env.sh.

Database internals

Topics about the Cassandra database.

Managing data

An overview of Cassandra's storage structure.

Cassandra storage basics

Understanding how Casssandra stores data.

The write path of an update

A brief description of the write path of an update.

About deletes

How Cassandra deletes data and why deleted data can reappear.

About hinted handoff writes

How hinted handoff works and how it optimizes the cluster.

About reads

How Cassandra combines results from the active memtable and potentially mutliple SSTables to satisfy a read.

About transactions and concurrency control

A brief description about transactions and concurrency control.

About data consistency

How up-to-date and synchronized a row of data is on all replicas.

Configuration

Configuration topics.

Node and cluster configuration

The cassandra.yaml file is the main configuration file for Cassandra.

Configuring gossip settings

Using the cassandra.yaml file to configure gossip.

Configuring the heap dump directory

Analyzing the heap dump file can help troubleshoot memory problems.

Generating tokens

If not using virtual nodes (vnodes), you still need to calculate tokens for your cluster.

Configuring virtual nodes

Topics about configuring virtual nodes.

Logging configuration

About Cassandra logging functionality using Simple Logging Facade for Java (SLF4J) with log4j.

Commit log archive configuration

Cassandra provides commit log archiving and point-in-time recovery.

Using multiple network interfaces

Steps for configuring Cassandra for multiple network interfaces or when using different regions in cloud implementations.

Hadoop support

Cassandra support for integrating Hadoop with Cassandra.

Operations

Operation topics.

Monitoring Cassandra

Monitoring topics.

Tuning Bloom filters

Cassandra uses Bloom filters to determine whether an SSTable has data for a particular row.

Data caching

Data caching topics.

Configuring memtable throughput

Configuring memtable throughput to improve write performance.

Configuring compaction

Steps for configuring compaction. The compaction process merges keys, combines columns, evicts tombstones, consolidates SSTables, and creates a new index in the merged SSTable.

Compression

Compression maximizes the storage capacity of Cassandra nodes by reducing the volume of data on disk and disk I/O, particularly for read-dominated workloads.

Testing compaction and compression

Enabling write survey mode.

Tuning Java resources

Consider tuning Java resources in the event of a performance degradation or high memory consumption.

Purging gossip state on a node

Correcting a problem in the gossip state.

Repairing nodes

Node repair makes data on a replica consistent with data on other nodes.

Adding or removing nodes, data centers, or clusters

Topics for adding or removing nodes, data centers, or clusters.

Backing up and restoring data

Cassandra backs up data by taking a snapshot of all on-disk data files (SSTable files) stored in the data directory.

Taking a snapshot

Steps for taking a global snapshot or per node.

Deleting snapshot files

Steps to delete snapshot files.

Enabling incremental backups

Steps to enable incremental backups. When incremental backups are enabled, Cassandra hard-links each flushed SSTable to a backups directory under the keyspace data directory.

Restoring from a Snapshot

Methods for restoring from a snapshot.

Restoring a snapshot into a new cluster

Steps for restoring a snapshot by recovering the cluster into another newly created cluster.

Recovering from a single disk failure using JBOD

Recovering from a single disk failure in a disk array using JBOD.

Cassandra tools

Topics for Cassandra tools.

The nodetool utility

A command line interface for Cassandra for managing a cluster.

Cassandra bulk loader (sstableloader)

Provides the ability to bulk load external data into a cluster, load existing SSTables into another cluster with a different number of nodes or replication strategy, and restore snapshots.

The sstablelevelreset utility

The sstablelevelreset utility will reset the level to 0 on a given set of SSTables.

The cassandra utility

Cassandra start-up parameters can be run from the command line (in Tarball installations) or specified in the cassandra-env.sh file (Package or Tarball installations).

The cassandra-stress tool

A Java-based stress testing utility for benchmarking and load testing a Cassandra cluster.

The sstablescrub utility

An offline version of nodetool scrub. This tool attempts to remove the corrupted parts while preserving non-corrupted data.

The sstablesplit utility

Use this tool to split SSTables files into multiple SSTables of a maximum designated size.

sstablekeys

The sstablekeys utility dumps table keys.

The sstableupgrade tool

Upgrade the SSTables in the specified table or snapshot to match the currently installed version of Cassandra.

References

Reference topics.

Starting and stopping Cassandra

Topics for starting and stopping Cassandra.

Install locations

Install location topics.

Cassandra-CLI utility (deprecated)

Cassandra stores storage configuration attributes in the system keyspace.

Moving data to/from other databases

Solutions for migrating from other databases.

Troubleshooting

Troubleshooting topics.

Peculiar Linux kernel performance problem on NUMA systems

Problems due to zone_reclaim_mode.

Reads are getting slower while writes are still fast

The cluster's IO capacity is not enough to handle the write load it is receiving.

Nodes seem to freeze after some period of time

Some portion of the JVM is being swapped out by the operating system (OS).

Nodes are dying with OOM errors

Nodes are dying with OutOfMemory exceptions.

Nodetool or JMX connections failing on remote nodes

Nodetool commands can be run locally but not on other nodes in the cluster.

View of ring differs between some nodes

Indicates that the ring is in a bad state.

Java reports an error saying there are too many open files

Java may not have open enough file descriptors.

Insufficient user resource limits errors

Insufficient resource limits may result in a number of errors in Cassandra and OpsCenter.

Cannot initialize class org.xerial.snappy.Snappy

An error may occur when Snappy compression/decompression is enabled although its library is available from the classpath.

Firewall idle connection timeout causing nodes to lose communication

Steps to configure the default idle connection timeout.

Release notes

Release notes for DataStax Community.