Shard routing for distributed queries

On DSE Search nodes, the shard selection algorithm for distributed queries uses a series of criteria to route sub-queries to the nodes most capable of handling them. The best node is determined by a chain of node comparisons. Selection occurs in the following order using these criteria:

Is node active?

Preference to active nodes.
Is the requested core indexing, or has it failed to index?

If node is not in either of these states, select the best node.

Node health rank, an exponentially increasing number between 0 and 1. It describes the health node so if all the previous criteria is equal, a node with a better score is chosen first. This node health rank value is exposed as a JMX metrics under ShardRouter.

Node health rank is a comparison of uptime and dropped mutations:

node health = uptime / (1 + drop_rate)

where:

drop_rate = the rate of dropped mutations per minute over a sliding window of configurable length.

To configure the historic time window, set dropped_mutation_window_minutes in dse.yaml.

Where is the dse.yaml file?

The location of the dse.yaml file depends on the type of installation:

Installation Type Location

Installation Type	Location
Package installations + Installer-Services installations	`/etc/dse/dse.yaml`
Tarball installations + Installer-No Services installations	`<installation_location>/resources/dse/conf/dse.yaml`

Package installations + Installer-Services installations

/etc/dse/dse.yaml

Tarball installations + Installer-No Services installations

<installation_location>/resources/dse/conf/dse.yaml

A high-dropped mutation rate indicates an overloaded node. For example, database insertions, and updates.

uptime = a score between 0 and 1 that weights recent downtime more heavily than less recent downtime.

Is the node close to the node that is issuing the query?

Node selection uses endpoint snitch proximity. Give preference to closer nodes.

After using these criteria, node selection is random.

To check on the shard router, add shards.info=true to the search query.

Shard routing for distributed queries

Was this helpful?

Give Feedback