Data distribution and replication

How data is distributed and factors influencing replication.

In Cassandra, data distribution and replication go together. This is because Cassandra is designed as a peer-to-peer system that makes copies of the data and distributes the copies among a group of nodes. Data is organized by table and identified by a primary key. The primary key determines which node the data is stored on. Copies of rows are called replicas. When data is first written, it is also referred to as a replica.

When your create a cluster, you must specify the following:

Virtual nodes: assigns data ownership to physical machines.
Partitioner: partitions the data across the cluster.
Replication strategy: determines the replicas for each row of data.
Snitch: defines the topology information that the replication strategy uses to place replicas.

DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.