Some basic concepts essential for understanding Apache Cassandra.
- A group of distributed nodes where you store your data. A cluster can have a single node, single datacenter, or multiple datacenters.
- A datacenter is group of related nodes configured together within a cluster for
replication purposes. It is not necessarily a physical datacenter. In DataStax
Enterprise, each datacenter usually contains only one node type. The node types are:
- Transactional - Sometimes referred to as a Cassandra node. (All DataStax Enterprise nodes are Cassandra nodes.)
- Analytical - Integration with Apache Spark (BYOH (bring your own Hadoop), and DSE Hadoop are deprecated starting with DataStax Enterprise 5.0).
- DSE Search - Integration with Apache Solr and sometimes referred to as a Solr node.
- The process of storing copies of data on multiple nodes to ensure reliability and fault tolerance. The number of copies is set by the replication factor.
- A partitioner distributes data evenly across the nodes in the cluster for load balancing.
- A snitch maps from the IP addresses of nodes to physical and virtual locations, such as racks and datacenters. Snitches inform Cassandra about the network topology so that requests are routed efficiently and allows Cassandra to distribute replicas by grouping machines into datacenters and racks.
- A sorted string table (SSTable) is an immutable data file to which Cassandra writes memtables periodically. SSTables are stored on disk sequentially and maintained for each Cassandra table.
- More information on how Apache Cassandra works