Consistent hashing

Consistent hashing allows distribution of data across a cluster to minimize reorganization when nodes are added or removed.

Consistent hashing allows data distribution across a cluster to minimize reorganization when nodes are added or removed. Consistent hashing partitions data based on the partition key. For an explanation of partition keys and primary keys, see the Data modeling example.

For example, if you have the following data:

name age car gender

jim

36

camaro

M

carol

37

bmw

F

johnny

12

M

suzy

10

F

The database assigns a hash value to each partition key:

Partition key Murmur3 hash value

jim

-2245462676723223822

carol

7723358927203680754

johnny

-6723372854036780875

suzy

1168604627387940318

Each node in the cluster is responsible for a range of data based on the hash value.

arcHashValueRange
Hash value range

DataStax Enterprise places the data on each node according to the value of the partition key and the range that the node is responsible for. For example, in a four node cluster, the data in this example is distributed as follows:

Node Start range End range Partition key Hash value

1

-9223372036854775808

-4611686018427387904

johnny

-6723372854036780875

2

-4611686018427387903

-1

jim

-2245462676723223822

3

0

4611686018427387903

suzy

1168604627387940318

4

4611686018427387904

9223372036854775807

carol

7723358927203680754

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com