The cassandra-shuffle utility

Shift a single-token-per-node architecture to virtual nodes (vnodes) without downtime. Avoid using.

The cassandra-shuffle utility splits up all the contiguous partition ranges (formerly token ranges) for each node and then randomly distributes them into virtual nodes throughout the cluster. Shuffling is a two-phase operation. The utility first schedules the range transfers and then begins transferring the scheduled ranges. You can shuffle on a per-data center basis and mix virtual node-enabled and non-virtual node data centers.

Warning: Using the shuffle utility on a running production system is not advised. It may take hours or even days to complete!

Bootstrapping a new data center is a much safer way to enable vnodes. Use the procedure described in Enabling virtual nodes on an existing production cluster instead.

Procedure

In a terminal window:

  1. In the cassandra.yaml file, set the num_tokens parameter.

    A good starting point for this parameter is 256.

  2. Restart the node.

    The node sleeps for RING_DELAY to make sure its view of the ring is accurate, and then splits its current range into the number of specified tokens. However, while the range is split into many tokens, the range remains contiguous; it is still equivalent to what it was before, but with more tokens.

  3. To distribute the tokens, initialize the shuffle operation:
    shuffle create
  4. Starts the transfers:
    shuffle enable
  5. To see what transfers remain at any point:
    shuffle ls