Shard routing for distributed queries

On DSE Search nodes, the shard selection algorithm for distributed queries uses a series of criteria to route sub-queries to the nodes most capable of handling them. The best node is determined by a chain of node comparisons. Selection occurs in the following order using these criteria:

  1. Is node active?

    Preference to active nodes.

  2. Is the requested core indexing, or has it failed to index?

    If node is not in either of these states, select the best node.

  3. Node health rank, an exponentially increasing number between 0 and 1. It describes the health node so if all the previous criteria is equal, a node with a better score is chosen first. This node health rank value is exposed as a JMX metrics under ShardRouter.

    Node health rank is a comparison of uptime and dropped mutations:

    node health = uptime / (1 + drop_rate)

    where:

    • drop_rate = the rate of dropped mutations per minute over a sliding window of configurable length.

      To configure the historic time window, set dropped_mutation_window_minutes in dse.yaml.

      Where is the dse.yaml file?

      The location of the dse.yaml file depends on the type of installation:

      Installation Type Location

      Package installations + Installer-Services installations

      /etc/dse/dse.yaml

      Tarball installations + Installer-No Services installations

      <installation_location>/resources/dse/conf/dse.yaml

      A high-dropped mutation rate indicates an overloaded node. For example, database insertions, and updates.

    • uptime = a score between 0 and 1 that weights recent downtime more heavily than less recent downtime.

  4. Is the node close to the node that is issuing the query?

    Node selection uses endpoint snitch proximity. Give preference to closer nodes.

After using these criteria, node selection is random.

To check on the shard router, add shards.info=true to the search query.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com