Shard routing for distributed queries
Shard routing for distributed queries is selected using node health and other criteria. Collecting node health data is optional.
On DSE Search nodes, the shard selection algorithm for distributed queries uses a series of
criteria to route sub-queries to the nodes most capable of handling them. The best node is
determined by a chain of node comparisons. Selection occurs in the following order using these
criteria:
- Is node active? Preference to active nodes.
- Is the requested core indexing, or has it failed to index? If node is not in either of these states, select the best node.
- Node health rank, an exponentially increasing number between 0 and 1 that describes the
health node so that, all the previous criteria being equal, a node with a better score is
chosen first. This node health rank value is exposed as a JMX metrics under ShardRouter.
How do the nodes compare on uptime and dropped mutations? The node health rank is calculated by the formula below:
where:node health = uptime / (1 + drop_rate)
- drop_rate = the rate of dropped mutations per minute over a sliding window of
configurable length. To configure the historic time window, set dropped_mutation_window_minutes
in dse.yaml.
A high dropped mutation rate indicates an overloaded node; for example, Cassandra insertions and updates.
- uptime = a score between 0 and 1 that weights recent downtime more heavily than less recent downtime.
- drop_rate = the rate of dropped mutations per minute over a sliding window of
configurable length. To configure the historic time window, set dropped_mutation_window_minutes
in dse.yaml.
- Is node close to the node that is issuing the query? Node selection uses Cassandra endpoint snitch proximity. Give preference to closer nodes.
To check on the shard router, add shard.info=true
to the Solr query.