Data distribution and replication
In DataStax Enterprise (DSE), data is organized by table and identified by a primary key, which determines which node the data is stored on. Replicas are copies of rows, which are stored on multiple nodes to ensure reliability and fault tolerance. When data is first written, it is also referred to as a replica. All replicas are equally important; there is no primary or master replica.
Features affecting replication include:
-
Virtual nodes assign data ownership to physical machines.
-
Partitioners distribute the data across the cluster.
-
Replication strategy determines the replicas for each row of data.
-
Snitches define the topology information that the replication strategy uses to place replicas.
- Data distribution overview
-
In DataStax Enterprise (DSE), the total amount of data managed by the cluster is represented as a ring with nodes.
- Consistent hashing
-
Consistent hashing allows distribution of data across a cluster to minimize reorganization when nodes are added or removed.
- Virtual nodes
-
Virtual nodes (vnodes) distribute data across nodes at a finer granularity than can be easily achieved using a single-token architecture.
- Data replication
-
DataStax Enterprise stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed.
- Partitioners
-
A partitioner determines how data is distributed across the nodes in the cluster (including replicas).
- Snitches
-
A snitch determines which datacenters and racks nodes belong to.