Shard routing for distributed queries
On DSE Search nodes, the shard selection algorithm for distributed queries uses a series of criteria to route sub-queries to the nodes most capable of handling them.
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
- Is node active?
Preference to active nodes.
- Is the requested core indexing, or has it failed to index?
If node is not in either of these states, select the best node.
- Node health rank, an exponentially increasing number between 0 and 1. It describes the
health node so if all the previous criteria is equal, a node with a better score is chosen
first. This node health rank value is exposed as a JMX metrics under ShardRouter. Node health rank is a comparison of uptime and dropped mutations:
where:node health = uptime / (1 + drop_rate)
- drop_rate = the rate of dropped mutations per minute over a sliding window of
configurable length.
To configure the historic time window, set dropped_mutation_window_minutes in dse.yaml.
A high-dropped mutation rate indicates an overloaded node. For example, database insertions, and updates.
- uptime = a score between 0 and 1 that weights recent downtime more heavily than less recent downtime.
- drop_rate = the rate of dropped mutations per minute over a sliding window of
configurable length.
- Is the node close to the node that is issuing the query?
Node selection uses endpoint snitch proximity. Give preference to closer nodes.
To check on the shard router, add shards.info=true
to the search query.
The ShardRouter MBean, not present in open source Solr, provides information about how DSE search routes queries.