Consistent hashing

Consistent hashing allows distribution of data across a cluster to minimize reorganization when nodes are added or removed.

Consistent hashing allows data distribution across a cluster to minimize reorganization when nodes are added or removed. Consistent hashing partitions data based on the partition key. For an explanation of partition keys and primary keys, see the Data modeling example.

For example, if you have the following data:


name	age	car	gender
jim	36	camaro	M
carol	37	bmw	F
johnny	12		M
suzy	10		F

The database assigns a hash value to each partition key:


Partition key	Murmur3 hash value
jim	-2245462676723223822
carol	7723358927203680754
johnny	-6723372854036780875
suzy	1168604627387940318

Each node in the cluster is responsible for a range of data based on the hash value.

Figure 1. Hash values in a four node cluster

DataStax Enterprise places the data on each node according to the value of the partition key and the range that the node is responsible for. For example, in a four node cluster, the data in this example is distributed as follows:


Node	Start range	End range	Partition key	Hash value
1	-9223372036854775808	-4611686018427387904	johnny	-6723372854036780875
2	-4611686018427387903	-1	jim	-2245462676723223822
3	0	4611686018427387903	suzy	1168604627387940318
4	4611686018427387904	9223372036854775807	carol	7723358927203680754