Data distribution and replication

How data is distributed and factors influencing replication.

In DataStax Enterprise, data distribution and replication go together. Data is organized by table and identified by a primary key, which determines which node the data is stored on. Replicas are copies of rows, which are stored on multiple nodes to ensure reliability and fault tolerance. When data is first written, it is also referred to as a replica. All replicas are equally important; there is no primary or master replica.

Features affecting replication include:

  • Virtual nodes assign data ownership to physical machines.
  • Partitioners distribute the data across the cluster.
  • Replication strategy determines the replicas for each row of data.
  • Snitches define the topology information that the replication strategy uses to place replicas.