Shard routing for distributed queries

Shard routing for distributed queries is selected using node health and other criteria. Collecting node health data is optional.

On DSE Search nodes, the shard selection algorithm for distributed queries uses a series of criteria to route sub-queries to the nodes most capable of handling them. The best node is determined by a chain of node comparisons. Selection occurs in the following order using these criteria:
  1. Is node active? Preference to active nodes.
  2. Is the requested core indexing, or has it failed to index? If node is not in either of these states, select the best node.
  3. Node health rank, an exponentially increasing number between 0 and 1 that describes the health node so that, all the previous criteria being equal, a node with a better score is chosen first. This node health rank value is exposed as a JMX metrics under ShardRouter.
    How do the nodes compare on uptime and dropped mutations? The node health rank is calculated by the formula below:
    node health = uptime / (1 + drop_rate)
    where:
    • drop_rate = the rate of dropped mutations per minute over a sliding window of configurable length. To configure the historic time window, set dropped_mutation_window_minutes in dse.yaml.

      A high dropped mutation rate indicates an overloaded node; for example, Cassandra insertions and updates.

    • uptime = a score between 0 and 1 that weights recent downtime more heavily than less recent downtime.
  4. Is node close to the node that is issuing the query? Node selection uses Cassandra endpoint snitch proximity. Give preference to closer nodes.
After using these criteria, node selection is random.

To check on the shard router, add shard.info=true to the Solr query.