Shuffling shards to balance the load

DSE Search uses a shuffling technique to balance the load, and also attempts to minimize the number of shards that are queried as well as the amount of data that is transferred from non-local nodes.

To balance the load in a distributed environment, choose from several strategies for shuffling the shards. The shard shuffling strategy specifies how one node is selected over others for reading the search data. The value of the shard.shuffling.strategy parameter must be one of the following values:

Possible values of shard.shuffling.strategy:

  • host

    Shards are selected based on the host that received the query.

  • query

    Shards are selected based on the query string.

  • host_query

    Shards are selected by host x query.

  • random

    Different random set of shards are selected with each request (default).

  • SEED

    Selects the same shard from one query to another.

Methods for selecting shard shuffling strategy

  • Append shard.shuffling.strategy = strategy to the HTTP API query. For example:

    http://localhost:8983/solr/wiki.solr/select?q=title:natio*&shard.shuffling.strategy=host

    Issuing this query determines the shard shuffling strategy for this query only.

  • Create a dse-search.properties file and POST it to Solr. For example:

    1. Create the dse-search.properties file with the following contents:

      shard.shuffling.strategy=query
    2. Post the command to DSE Search. For example:

      curl -v --data-binary @dse-search.properties
          http://localhost:8983/solr/resource/wiki.solr/dse-search.properties

      Posting the command determines the shard shuffling strategy for all queries to the specified Solr core. The strategy is propagated to all nodes and saved in the search index metadata.

  • Set the following parameters to use the SEED strategy:

    1. Pass the shard.shuffling.strategy=SEED as a request parameter.

    2. Specify a request parameter, such as an IP address or any string, using the shard.shuffling.seed parameter. When you reuse the same seed value between queries on a stable cluster, the same shard strategy is in effect.

      Every time you pass the same string, the same list of shards is queried, regardless of the target node you actually query; if you change the string, a different list of shards are queried.

    3. Verify that the strategy was maintained by passing the shards.info=true request parameter. For example:

      curl "http://localhost:8983/solr/demo.solr/select/?q=text:search&shards.info=true&shard.shuffling.strategy=SEED&shard.shuffling.seed=192.168.0.1&rows=0"

Shuffling does not always result in the node selection you might expect. For example, using a replication factor of 3 with six nodes, the best and only solution is a two-shard solution where half of the data is read from the originator node and half from another node. A three-shard solution would be inefficient.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com