Data distribution overview
In DataStax Enterprise (DSE), the total amount of data managed by the cluster is represented as a ring. The ring is divided into ranges equal to the number of nodes, with each node being responsible for one or more ranges of the data. Before a node can join the ring, it must be assigned a token. The token value determines the node’s position in the ring and its range of data. Table data is partitioned across the nodes based on the row key.
To determine the node where the first replica of a row will live, DSE locates the node with a token value greater than that of the row key. Each node is responsible for the region of the ring between the node itself and its predecessor, excluding the predecessor’s token value. With the nodes sorted in token order, the last node is considered the predecessor of the first node.
For example, consider a simple four-node cluster where all of the row keys managed by the cluster were numbers in the range of 0 to 100. Each node is assigned a token that represents a point in this range. In this simple example, the token values are 0, 25, 50, and 75. The first node with token 0 is responsible for the wrapping range (76-0). The node with the lowest token also accepts row keys less than the lowest token and more than the highest token.
Token number |
DSE provides several types of partitioners, which should be assigned to your cluster.
An initial_token
value in cassandra.yaml is assigned to each node so that each node is responsible for roughly an equal amount of data.