Documentation for developers and administrators on installing, configuring, and using the features and capabilities of Apache Cassandra scalable open source NoSQL database.
An overview of new features in Cassandra.
Cassandra Query Language (CQL) is the default and primary interface into the Cassandra DBMS.
Understanding the architecture
Important topics for understanding Cassandra.
Architecture in brief
Essential information for understanding and using Cassandra.
Internode communications (gossip)
Cassandra uses a protocol called gossip to discover location and state information about the other nodes participating in a Cassandra cluster.
Data distribution and replication
How data is distributed and factors influencing replication.
A partitioner determines how data is distributed across the nodes in the cluster (including replicas).
A snitch determines which datacenters and racks nodes belong to.
Client read or write requests can be sent to any node in the cluster because all nodes in Cassandra are peers.
Planning a deployment
Vital information about successfully deploying a Cassandra cluster.
Various installation methods.
Installing the RHEL-based packages
Install using Yum repositories on RHEL, CentOS, and Oracle Linux.
Installing the Debian and Ubuntu packages
Install using APT repositories on Debian and Ubuntu.
Installing from the binary tarball
Install on all Linux-based platforms using a binary tarball.
Installing prior releases of DataStax Community
Steps for installing the same version as other nodes in your cluster.
Installing Python 2.7 on older RHEL-based package installations
Steps for installing Python 2.7 on older distributions such as CentOS 6.5.
Uninstalling DataStax Community
Steps for uninstalling Cassandra by install type.
Installing on cloud providers
Installation methods for the supported cloud providers.
Installing the Oracle JDK
Instructions for various platforms.
Recommended production settings for Linux and Windows
Recommendations for Linux and Windows production environments.
Initializing a cluster
Topics for deploying a cluster.
Initializing a multiple node cluster (single datacenter)
A deployment scenario for a Cassandra cluster with a single datacenter.
Initializing a multiple node cluster (multiple datacenters)
A deployment scenario for a Cassandra cluster with multiple datacenters.
Topics for securing Cassandra.
Cassandra provides various security features to the open source community.
Topics for using SSL in Cassandra.
Topics for internal authentication.
Topics about internal authorization.
Configuring firewall port access
Which ports to open when nodes are protected by a firewall.
Enabling JMX authentication
The default settings for Cassandra make JMX accessible only from localhost. To enable remote JMX connections, change the LOCAL_JMX setting in cassandra-env.sh.
Topics about the Cassandra database.
A description about Cassandra's storage structure and engine.
Separate table directories
Cassandra provides fine-grained control of table storage on disk.
Cassandra storage basics
Understanding how Casssandra stores data.
The write path of an update
A brief description of the write path of an update.
How Cassandra deletes data and why deleted data can reappear.
About hinted handoff writes
How hinted handoff works and how it optimizes the cluster.
How reads work and factors affecting them.
About transactions and concurrency control
A brief description about transactions and concurrency control.
Topics about how up-to-date and synchronized a row of data is on all replicas.
cassandra.yaml configuration file
The cassandra.yaml file is the main configuration file for Cassandra.
Configuring gossip settings
Using the cassandra.yaml file to configure gossip.
Configuring the heap dump directory
Analyzing the heap dump file can help troubleshoot memory problems.
Configuring the buffered read size
Configuring the buffered read size can reduce problems with wasted disk I/) and unnecessary garbage collection.
Configuring virtual nodes
Topics about configuring virtual nodes.
Using multiple network interfaces
Steps for configuring Cassandra for multiple network interfaces or when using different regions in cloud implementations.
About Cassandra logging functionality using Simple Logging Facade for Java (SLF4J) with a logback backend.
Commit log archive configuration
Cassandra provides commit log archiving and point-in-time recovery.
If not using virtual nodes (vnodes), you must calculate tokens for your cluster.
Cassandra support for integrating Hadoop with Cassandra.
Tuning Bloom filters
Cassandra uses Bloom filters to determine whether an SSTable has data for a particular row.
Data caching topics.
Configuring memtable throughput
Configuring memtable throughput to improve write performance.
Steps for configuring compaction. The compaction process merges keys, combines columns, evicts tombstones, consolidates SSTables, and creates a new index in the merged SSTable.
Compression maximizes the storage capacity of Cassandra nodes by reducing the volume of data on disk and disk I/O, particularly for read-dominated workloads.
Tuning Java resources
Tuning the Java Virtual Machine (JVM) can improve performance or reduce high memory consumption.
Purging gossip state on a node
Correcting a problem in the gossip state.
Node repair topics.
Adding or removing nodes, datacenters, or clusters
Topics for adding or removing nodes, datacenters, or clusters.
Backing up and restoring data
Cassandra backs up data by taking a snapshot of all on-disk data files (SSTable files) stored in the data directory.
A brief description of how Cassandra backs up data.
Taking a snapshot
Steps for taking a global snapshot or per node.
Deleting snapshot files
Steps to delete snapshot files.
Enabling incremental backups
Steps to enable incremental backups. When incremental backups are enabled, Cassandra hard-links each memtable flushed to a SSTable to a backups directory under the keyspace data directory.
Restoring from a snapshot
Methods for restoring from a snapshot.
Restoring a snapshot into a new cluster
Steps for restoring a snapshot by recovering the cluster into another newly created cluster.
Recovering using JBOD
Recovering from a single disk failure in a disk array using JBOD.
Topics for Cassandra tools.
The nodetool utility
A command line interface for managing a cluster.
The cassandra utility
Cassandra start-up parameters can be run from the command line (in Tarball installations) or specified in the cassandra-env.sh file (Package or Tarball installations).
The cassandra-stress tool
A Java-based stress testing utility for basic benchmarking and load testing a Cassandra cluster.
Tools for using, upgrading, and changing Cassandra SSTables.
Starting and stopping Cassandra
Topics for starting and stopping Cassandra.
Install location topics.
Cassandra include file
Set environment variables (cassandra.in.sh).
Cassandra-CLI utility (deprecated)
Deprecated configuration attributes. Will be removed in Cassandra 3.0.
Moving data to/from other databases
Solutions for migrating from other databases.
Release notes for DataStax Community.