Data distribution and replication

In DataStax Enterprise (DSE), data is organized by table and identified by a primary key, which determines which node the data is stored on. Replicas are copies of rows, which are stored on multiple nodes to ensure reliability and fault tolerance. When data is first written, it is also referred to as a replica. All replicas are equally important; there is no primary or master replica.

Features affecting replication include:

  • Virtual nodes assign data ownership to physical machines.

  • Partitioners distribute the data across the cluster.

  • Replication strategy determines the replicas for each row of data.

  • Snitches define the topology information that the replication strategy uses to place replicas.


Data distribution overview

In DataStax Enterprise (DSE), the total amount of data managed by the cluster is represented as a ring with nodes.

Consistent hashing

Consistent hashing allows distribution of data across a cluster to minimize reorganization when nodes are added or removed.

Virtual nodes

Virtual nodes (vnodes) distribute data across nodes at a finer granularity than can be easily achieved using a single-token architecture.

Data replication

DataStax Enterprise stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed.

Partitioners

A partitioner determines how data is distributed across the nodes in the cluster (including replicas).

Snitches

A snitch determines which datacenters and racks nodes belong to.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com