Data distribution and replication
In DataStax Enterprise, data distribution and replication go together. Data is organized by table and identified by a primary key, which determines which node the data is stored on. Replicas are copies of rows, which are stored on multiple nodes to ensure reliability and fault tolerance. When data is first written, it is also referred to as a replica. All replicas are equally important; there is no primary or master replica.
How the data is distributed to the nodes in a cluster is one of the important concepts to understand.
Features affecting replication include:
-
Replication strategy determines the replicas for each row of data.
-
Virtual nodes assign data ownership to physical machines.
-
Partitioners distribute the data across the cluster.
-
Snitches define the topology information that the replication strategy uses to place replicas.