About Apache Cassandra
Documentation for developers and administrators on installing, configuring, and using the features and capabilities of Apache Cassandra scalable open source NoSQL database.
What's new in Cassandra 2.2
An overview of new features in Cassandra.
Understanding the architecture
Important topics for understanding Cassandra.
Architecture in brief
Essential information for understanding and using Cassandra.
Internode communications (gossip)
Cassandra uses a protocol called gossip to discover location and state information about the other nodes participating in a Cassandra cluster.
Data distribution and replication
How data is distributed and factors influencing replication.
Partitioners
A partitioner determines how data is distributed across the nodes in the cluster (including replicas).
Snitches
A snitch determines which data centers and racks nodes belong to.
Database internals
Topics about the Cassandra database.
Storage engine
A description about Cassandra's storage structure and engine.
How Cassandra reads and writes data
Understanding how Cassandra stores data.
Data consistency
Topics about how up-to-date and synchronized a row of data is on all replicas.
Planning a cluster deployment
Vital information about successfully deploying a Cassandra cluster.
Selecting hardware for enterprise implementations
Choosing appropriate hardware depends on selecting the right balance of the following resources: memory, CPU, disks, number of nodes, and network.
Planning an Amazon EC2 cluster
Important information for deploying a production Cassandra cluster on Amazon EC2.
Calculating partition size
Determining how much data your Cassandra partitions can hold.
Calculating usable disk capacity
Determining how much data your Cassandra nodes can hold.
Calculating user data size
Accounting for storage overhead in determining user data size.
Anti-patterns in Cassandra
Implementation or design patterns that are ineffective and/or counterproductive in Cassandra production installations. Correct patterns are suggested in most cases.
Installing
Various installation methods.
Installing DataStax Community on Windows systems
About installing on Windows systems.
Installing prior releases of DataStax Community
Steps for installing the same version as other nodes in your cluster.
Uninstalling DataStax Community from Windows
Steps for uninstalling Cassandra.
Installing on cloud providers
Installing Cassandra on cloud providers.
Recommended production settings for Windows
Recommendations for production environments.
Windows installation directories
Configuration files directory locations.
Configuration
Configuration topics.
cassandra.yaml configuration file
The cassandra.yaml file is the main configuration file for Cassandra.
Cassandra include file
Set environment variables (cassandra.in.bat).
Security
Topics for securing Cassandra.
Configuring gossip settings
Using the cassandra.yaml file to configure gossip.
Configuring the heap dump directory
Analyzing the heap dump file can help troubleshoot memory problems.
Configuring virtual nodes
Topics about configuring virtual nodes.
Using multiple network interfaces
Steps for configuring Cassandra for multiple network interfaces or when using different regions in cloud implementations.
Configuring logging
Cassandra logging functionality using Simple Logging Facade for Java (SLF4J) with a logback backend.
Commit log archive configuration
Cassandra provides commit log archiving and point-in-time recovery.
Generating tokens
If not using virtual nodes (vnodes), you must calculate tokens for your cluster.
Hadoop support
Cassandra support for integrating Hadoop with Cassandra.
Initializing a cluster
Topics for deploying a cluster.
Initializing a multiple node cluster (single data center)
A deployment scenario for a Cassandra cluster with a single data center.
Initializing a multiple node cluster (multiple data centers)
A deployment scenario for a Cassandra cluster with multiple data centers.
Starting or stopping Cassandra
Steps for starting or stopping the Cassandra Windows service.
Clearing the data from Windows
Remove all data from a Windows installations.
Clearing the data for an AMI restart
Clearing the data for an Amazon Machine Image restart.
Operations
Cassandra operation topics.
Adding or removing nodes, data centers, or clusters
Topics for adding or removing nodes, data centers, or clusters.
Backing up and restoring data
Cassandra backs up data by taking a snapshot of all on-disk data files (SSTable files) stored in the data directory.
Repairing nodes
Node repair topics.
Monitoring Cassandra
Monitoring topics.
Tuning Java resources
Tuning the Java Virtual Machine (JVM) can improve performance or reduce high memory consumption.
Data caching
Data caching topics.
Configuring memtable throughput
Configuring memtable throughput to improve write performance.
Configuring compaction
Steps for configuring compaction. The compaction process merges keys, combines columns, evicts tombstones, consolidates SSTables, and creates a new index in the merged SSTable.
Compression
Compression maximizes the storage capacity of Cassandra nodes by reducing the volume of data on disk and disk I/O, particularly for read-dominated workloads.
Testing compaction and compression
Enabling write survey mode.
Tuning Bloom filters
Cassandra uses Bloom filters to determine whether an SSTable has data for a particular row.
Moving data to or from other databases
Solutions for migrating from other databases.
Purging gossip state on a node
Correcting a problem in the gossip state.
Cassandra tools
Topics for Cassandra tools.
The nodetool utility
A command line interface for managing a cluster.
The cassandra utility
Cassandra start-up parameters can be run from the command line or specified in the cassandra-env.ps1 file.
The cassandra-stress tool
A Java-based stress testing utility for basic benchmarking and load testing a Cassandra cluster.
SSTable utilities
Reads are getting slower while writes are still fast
The cluster's IO capacity is not enough to handle the write load it is receiving.
Nodes seem to freeze after some period of time
Some portion of the JVM is being swapped out by the operating system (OS).
Nodes are dying with OOM errors
Nodes are dying with OutOfMemory exceptions.
Nodetool or JMX connections failing on remote nodes
Nodetool commands can be run locally but not on other nodes in the cluster.
Handling schema disagreements
Check for and resolve schema disagreements.
View of ring differs between some nodes
Indicates that the ring is in a bad state.
Lost communication due to firewall timeouts
Steps to configure the default idle connection timeout.
Release notes
Release notes for DataStax Community.