Data distribution and replication
In DataStax Enterprise (DSE), data is organized by table and identified by a primary key, which determines which node the data is stored on. Replicas are copies of rows, which are stored on multiple nodes to ensure reliability and fault tolerance. When data is first written, it is also referred to as a replica. All replicas are equally important; there is no primary or master replica.
Features affecting replication include:
- Data distribution overview
In DataStax Enterprise (DSE), the total amount of data managed by the cluster is represented as a ring with nodes.
- Consistent hashing
Consistent hashing allows distribution of data across a cluster to minimize reorganization when nodes are added or removed.
- Virtual nodes
Virtual nodes (vnodes) distribute data across nodes at a finer granularity than can be easily achieved using a single-token architecture.
- Data replication
DataStax Enterprise stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed.
A partitioner determines how data is distributed across the nodes in the cluster (including replicas).
A snitch determines which datacenters and racks nodes belong to.