Rebalancing a cluster overview

Cluster rebalancing ensures that each non-virtual node in a DataStax Enterprise cluster manages an equal amount of data.

Cluster rebalancing ensures that each non-virtual node in a DataStax Enterprise cluster manages an equal amount of data. Currently, OpsCenter only supports rebalancing on clusters using the random partitioner or murmur 3 partitioner. Ordered partitioners are not supported. A rebalance is usually required only when the cluster topology has changed in some way, such as nodes were added or removed, or the replica placement strategy was changed. Configure an alert to notify you when a cluster requires rebalancing. If using role-based security, set permission to rebalance a cluster in the Cluster Topology section of the Role dialog.

A cluster is considered balanced when each node is responsible for an equal range of data. OpsCenter determines cluster balance by evaluating the partitioner tokens assigned to each node to make sure that the data ranges each node is responsible for are evenly distributed. Even though a cluster is considered balanced, it is still possible that some nodes have more data relative to others because only the number of rows (not the size of rows) managed by each node is taken into account.

The optimal path to rebalance clusters with around 100 nodes or less is determined by calculating the number of moves required and how much streaming data those moves would entail. If a cluster contains more than around 100 nodes, the optimal path is calculated based on simply the number of moves to expedite the rebalancing process.

When rebalancing a cluster, OpsCenter performs the following actions:
  • Calculates appropriate token ranges for each node and identifies nodes that need to move.
  • Makes sure that there is appropriate free space to perform the rebalancing.
  • Moves nodes one node at a time so as to lessen the impact on the cluster workloads. A move operation involves changing the partitioner token assignment for the node, thus changing the range of data that the node is responsible for. A move streams data from other nodes.
  • Runs cleanup after a move is complete on a node. A cleanup operation removes rows that a node is no longer responsible for.