Data distribution and replication

How data is distributed and factors influencing replication.

In Cassandra, data distribution and replication go together. Data is organized by table and identified by a primary key, which determines which node the data is stored on. Replicas are copies of rows. When data is first written, it is also referred to as a replica.

Factors influencing replication include:

  • Virtual nodes: assigns data ownership to physical machines.
  • Partitioner: partitions the data across the cluster.
  • Replication strategy: determines the replicas for each row of data.
  • Snitch: defines the topology information that the replication strategy uses to place replicas.