About Apache CassandraDocumentation for developers and administrators on installing, configuring, and using the features and capabilities of Apache Cassandra scalable open source NoSQL database.
What's new in Cassandra 2.2An overview of new features in Cassandra.
Understanding the architectureImportant topics for understanding Cassandra.
Architecture in briefEssential information for understanding and using Cassandra.
Internode communications (gossip)Cassandra uses a protocol called gossip to discover location and state information about the other nodes participating in a Cassandra cluster.
Data distribution and replicationHow data is distributed and factors influencing replication.
PartitionersA partitioner determines how data is distributed across the nodes in the cluster (including replicas).
SnitchesA snitch determines which data centers and racks nodes belong to.
Database internalsTopics about the Cassandra database.
Storage engineA description about Cassandra's storage structure and engine.
How Cassandra reads and writes dataUnderstanding how Cassandra stores data.
Data consistencyTopics about how up-to-date and synchronized a row of data is on all replicas.
Planning a cluster deploymentVital information about successfully deploying a Cassandra cluster.
Selecting hardware for enterprise implementationsChoosing appropriate hardware depends on selecting the right balance of the following resources: memory, CPU, disks, number of nodes, and network.
Planning an Amazon EC2 clusterImportant information for deploying a production Cassandra cluster on Amazon EC2.
Calculating partition sizeDetermining how much data your Cassandra partitions can hold.
Calculating usable disk capacityDetermining how much data your Cassandra nodes can hold.
Calculating user data sizeAccounting for storage overhead in determining user data size.
Anti-patterns in CassandraImplementation or design patterns that are ineffective and/or counterproductive in Cassandra production installations. Correct patterns are suggested in most cases.
InstallingVarious installation methods.
Installing the RHEL-based packagesInstall using Yum repositories on RHEL, CentOS, and Oracle Linux.
Installing the Debian and Ubuntu packagesInstall using APT repositories on Debian and Ubuntu.
Installing from the binary tarballInstall on all Linux-based platforms using a binary tarball.
Installing prior releases of DataStax CommunitySteps for installing the same version as other nodes in your cluster.
Uninstalling DataStax Community from LinuxSteps for uninstalling Cassandra by installation type.
Installing on cloud providersInstalling Cassandra on cloud providers.
Installing the Oracle JDKInstructions for various platforms.
Recommended production settings for LinuxRecommendations for production environments.
Install locationsInstall location topics.
cassandra.yaml configuration fileThe cassandra.yaml file is the main configuration file for Cassandra.
Cassandra include fileSet environment variables (cassandra.in.sh).
SecurityTopics for securing Cassandra.
Configuring gossip settingsUsing the cassandra.yaml file to configure gossip.
Configuring the heap dump directoryAnalyzing the heap dump file can help troubleshoot memory problems.
Configuring virtual nodesTopics about configuring virtual nodes.
Using multiple network interfacesSteps for configuring Cassandra for multiple network interfaces or when using different regions in cloud implementations.
Configuring loggingCassandra logging functionality using Simple Logging Facade for Java (SLF4J) with a logback backend.
Commit log archive configurationCassandra provides commit log archiving and point-in-time recovery.
Generating tokensIf not using virtual nodes (vnodes), you must calculate tokens for your cluster.
Hadoop supportCassandra support for integrating Hadoop with Cassandra.
Initializing a clusterTopics for deploying a cluster.
Initializing a multiple node cluster (single data center)A deployment scenario for a Cassandra cluster with a single data center.
Initializing a multiple node cluster (multiple data centers)A deployment scenario for a Cassandra cluster with multiple data centers.
Starting and stopping CassandraTopics for starting and stopping Cassandra.
Clearing the data for an AMI restartClearing the data for an Amazon Machine Image restart.
OperationsCassandra operation topics.
Adding or removing nodes, data centers, or clustersTopics for adding or removing nodes, data centers, or clusters.
Backing up and restoring dataCassandra backs up data by taking a snapshot of all on-disk data files (SSTable files) stored in the data directory.
Repairing nodesNode repair topics.
Tuning Java resourcesTuning the Java Virtual Machine (JVM) can improve performance or reduce high memory consumption.
Data cachingData caching topics.
Configuring memtable throughputConfiguring memtable throughput to improve write performance.
Configuring compactionSteps for configuring compaction. The compaction process merges keys, combines columns, evicts tombstones, consolidates SSTables, and creates a new index in the merged SSTable.
CompressionCompression maximizes the storage capacity of Cassandra nodes by reducing the volume of data on disk and disk I/O, particularly for read-dominated workloads.
Tuning Bloom filtersCassandra uses Bloom filters to determine whether an SSTable has data for a particular row.
Moving data to or from other databasesSolutions for migrating from other databases.
Purging gossip state on a nodeCorrecting a problem in the gossip state.
Cassandra toolsTopics for Cassandra tools.
The nodetool utilityA command line interface for managing a cluster.
The cassandra utilityCassandra start-up parameters can be run from the command line (in Tarball installations) or specified in the cassandra-env.sh file (Package or Tarball installations).
The cassandra-stress toolA Java-based stress testing utility for basic benchmarking and load testing a Cassandra cluster.
Nodes appear unresponsive due to a Linux futex_wait() kernel bugNodes randomly freeze and become unresponsive for an unknown reason.
Reads are getting slower while writes are still fastThe cluster's IO capacity is not enough to handle the write load it is receiving.
Nodes seem to freeze after some period of timeSome portion of the JVM is being swapped out by the operating system (OS).
Nodes are dying with OOM errorsNodes are dying with OutOfMemory exceptions.
Nodetool or JMX connections failing on remote nodesNodetool commands can be run locally but not on other nodes in the cluster.
Handling schema disagreementsCheck for and resolve schema disagreements.
View of ring differs between some nodesIndicates that the ring is in a bad state.
Insufficient user resource limits errorsInsufficient resource limits may result in a number of errors in Cassandra.
Cannot initialize class org.xerial.snappy.SnappyAn error may occur when Snappy compression/decompression is enabled although its library is available from the classpath.
Lost communication due to firewall timeoutsSteps to configure the default idle connection timeout.
Release notesRelease notes for DataStax Community.