Data distribution to nodes
In Hyper-Converged Database (HCD), the total amount of data managed by the cluster is represented as a ring. The ring is divided into ranges equal to the number of nodes, and each node is responsible for one or more data ranges. Before a node can join the ring, it must be assigned a token. The token value determines the node’s position in the ring and its range of data. Table data is partitioned across the nodes based on the row key.
To determine the node where the first replica of a row will live, HCD locates the node with a token value greater than that of the row key. Each node is responsible for the region of the ring between the node itself and its predecessor, excluding the predecessor’s token value. With the nodes sorted in token order, the last node is considered the predecessor of the first node.
For example, consider a simple four-node cluster where all of the row keys managed by the cluster are numbers in the range of 0 to 100. Each node is assigned a token that represents a point in this range. In this simple example, the token values are 0, 25, 50, and 75. The first node with token 0 is responsible for the wrapping range (76-0). The node with the lowest token also accepts row keys less than the lowest token and more than the highest token.
HCD provides several types of partitioners, which should be assigned to your cluster.
An initial_token
value is assigned to each node so that each node is responsible for roughly an equal amount of data.