Data distribution and replication
In DataStax Enterprise, data distribution and replication go together. DSE organizes data by table and uses a primary key to identify unique records, helping determine the node on which to store data. Replicas are copies of rows stored on multiple nodes to ensure reliability and fault tolerance. A replica also refers to data first written. All replicas are equally important; there is no primary replica.
Learn the important concept of how the data is distributed to the nodes in a cluster.
Features affecting replication include:
-
Replication strategy determines the replicas for each row of data.
-
Virtual nodes assign data ownership to physical machines.
-
Partitioners distribute the data across the cluster.
-
Snitches define the topology information that the replication strategy uses to place replicas.