About Apache Cassandra
This guide provides information for developers and administrators on installing, configuring, and using the features and capabilities of Cassandra.
Key features
Key improvements in Cassandra 1.2.
Key CQL features
About the release of CQL 3.
Other CQL 3 enhancements
Cassandra 1.2 introduced many enhancements in addition to the key CQL features.
Other changes
Additional changes to Cassandra 1.2.
CQL
Cassandra Query Language (CQL) is the default and primary interface into the Cassandra DBMS.
Understanding the architecture
Important topics for understanding Cassandra.
Architecture in brief
Essential information for understanding and using Cassandra.
Internode communications (gossip)
Cassandra uses a protocol called gossip to discover location and state information about the other nodes participating in a Cassandra cluster.
Data distribution and replication
How data is distributed and factors influencing replication.
Partitioners
A partitioner determines how data is distributed across the nodes in the cluster (including replicas).
Snitches
A snitch determines which data centers and racks are written to and read from.
Client requests
Client read or write requests can go to any node in the cluster because all nodes in Cassandra are peers.
Planning a cluster deployment
Vital information about successfully deploying a Cassandra cluster.
Installing DataStax Community
Various installation methods.
Installing the RHEL-based packages
Install using Yum repositories on RHEL, CentOS, and Oracle Linux.
Installing the Debian and Ubuntu packages
Install using APT repositories on Debian and Ubuntu.
Installing on all Linux or Mac OSX systems
Install on all Linux-based platforms, including Mac OSX and platforms without package support, or if you do not have or want a root installation.
Installing or expanding a Cassandra cluster on Amazon EC2
A step-by-step guide for installing the DataStax Community AMI (Amazon Machine Image).
Installing the Oracle JRE and the JNA
Instructions for installing the Orcle JRE on various platforms.
Recommended production settings
Recommendations for production environments.
Upgrading Cassandra
Upgrading to DataStax Community 1.2.10
Best practices
Steps to perform before upgrading Cassandra.
Prerequisites
A data type change may require modification of your queries.
Debian or Ubuntu
Steps to upgrading.
RHEL or CentOS
Steps to upgrading.
Tarball
Steps to upgrading.
Completing the upgrade
Final steps to upgrading Cassandra.
Changes impacting upgrade
Changes in Cassandra you should be aware of.
Initializing a cluster
Topics for deploying a cluster.
Initializing a multiple node cluster (single data center)
A deployment scenario for a Cassandra cluster with a single data center.
Initializing a multiple node cluster (multiple data centers)
A deployment scenario for a Cassandra cluster with multiple data centers.
Security
Topics for securing Cassandra.
Securing Cassandra
Cassandra provides these security features to the open source community.
SSL encryption
Topics for using SSL in Cassandra.
Internal authentication
Topics for internal authentication.
Internal authorization
Topics about internal authorization.
Configuring firewall port access
Which ports to open when nodes are protected by a firewall.
Database internals
Topics about the Cassandra database.
Managing data
An overview of Cassandra's storage structure.
About writes
Understanding how Casssandra writes and reads data, the hinted handoff feature, and areas of conformance and non-conformance to the ACID (atomic, consistent, isolated, durable) database properties.
About inserts and updates
A brief description and illustration about insert and update operations.
About deletes
How Cassandra deletes data and why deleted data can reappear.
About hinted handoff writes
How hinted handoff works and how it optimizes the cluster.
About reads
How Cassandra combines results from the active memtable and potentially mutliple SSTables to satisfy a read.
About transactions and concurrency control
A brief description about transactions and concurrency control.
Configuring data consistency
How up-to-date and synchronized a row of data is on all replicas.
About schema changes
Large numbers of schema changes can simultaneously take place in a cluster without any schema disagreement among nodes.
Configuration
Configuration topics.
Node and cluster configuration
The cassandra.yaml file is the main configuration file for Cassandra.
Configuring the heap dump directory
Analyzing the heap dump file can help troubleshoot memory problems.
Generating tokens
If not using virtual nodes (vnodes), you still need to calculate tokens for your cluster.
Configuring virtual nodes
Topics about configuring virtual nodes.
Logging configuration
About Cassandra logging functionality using Simple Logging Facade for Java (SLF4J) with log4j.
Commit log archive configuration
Cassandra provides commitlog archiving and point-in-time recovery.
Operations
Operation topics.
Monitoring Cassandra
Monitoring topics.
Tuning Bloom filters
Cassandra uses Bloom filters to determine whether an SSTable has data for a particular row.
Data caching
Data caching topics.
Configuring memtable throughput
Configuring memtable throughput to improve write performance.
Configuring compaction
Steps for configuring compaction.
Compression
Compression topics.
Testing compaction and compression
Enabling write survey mode.
Tuning Java resources
Consider tuning Java resources in the event of a performance degradation or high memory consumption.
Repairing nodes
Node repair makes data on a replica consistent with data on other nodes.
Adding or removing a node or data center
Topics for adding or removing nodes, data centers, or clusters.
Backing up and restoring data
Cassandra backs up data by taking a snapshot of all on-disk data files (SSTable files) stored in the data directory.
Taking a snapshot
Steps for taking a global snapshot or per node.
Deleting snapshot files
Steps to delete snapshot files.
Enabling incremental backups
Steps to enable incremental backups. When incremental backups are enabled, Cassandra hard-links each flushed SSTable to a backups directory under the keyspace data directory.
Restoring from a Snapshot
Methods for restoring from a snapshot.
Recovering from a single disk failure using JBOD
Recovering from a single disk failure in a disk array using JBOD.
The nodetool utility
A command line interface for Cassandra for managing a cluster.
Cassandra bulk loader
The Cassandra bulk loader, also called the sstableloader tool, provides the ability to bulk load external data into a cluster, load existing SSTables into another cluster with a different number nodes or replication strategy, and restore snapshots.
The cassandra utility
Cassandra start-up parameters can be run from the command line (in Tarball installations) or specified in the cassandra-env.sh file (Package or Tarball installations).
The cassandra-stress tool
A Java-based stress testing utility for benchmarking and load testing a Cassandra cluster.
The cassandra-shuffle utility
Shift a single-token-per-node architecture to virtual nodes (vnodes) without downtime. Avoid using.
The sstablescrub utility
An offline version of nodetool scrub. This tool attempts to remove the corrupted parts while preserving non-corrupted data.
The sstable2json / json2sstable utilities
Topics for using sstable2json json2sstable.
The sstableupgrade tool
Upgrade the SSTables in the specified table (or snapshot) to match the current version of Cassandra.
Using CLI
CLI legacy topics.
References
Reference topics.
Starting and stopping Cassandra
Topics for starting and stopping Cassandra.
Install locations
Install location topics.
CLI keyspace and table storage configuration
Cassandra stores storage configuration attributes in the system keyspace.
Moving data to/from other databases
Cassandra offers several solutions for migrating from other databases.
Troubleshooting
Troubleshooting topics.
Reads are getting slower while writes are still fast
The cluster's IO capacity is not enough to handle the write load it is receiving.
Nodes seem to freeze after some period of time
Some portion of the JVM is being swapped out by the operating system (OS).
Nodes are dying with OOM errors
Nodes are dying with OutOfMemory exceptions.
Nodetool or JMX connections failing on remote nodes
Nodetool commands can be run locally but not on other nodes in the cluster.
View of ring differs between some nodes
Indicates that the ring is in a bad state.
Java reports an error saying there are too many open files
Java may not have open enough file descriptors.
Cannot initialize class org.xerial.snappy.Snappy
An error may occur when Snappy compression/decompression is enabled although its library is available from the classpath.
Firewall idle connection timeout causing nodes to lose communication during low traffic times
Steps to configure the default idle connection timeout.
Release notes
Fixes and New Features in Cassandra.