Parallelizing large row reads

For performance, you can configure DSE Search to parallelize the retrieval of a large number of rows.Configure the queryResponseWriter in the search index as follows:

<queryResponseWriter name="javabin" class="solr.BinaryResponseWriter">
  <str name="resolverFactory">com.datastax.bdp.search.solr.response.ParallelRowResolver$Factory</str>
</queryResponseWriter>

By default, the parallel row resolver uses up to <x> threads to execute parallel reads, where x is the number of CPUs. Each thread sequentially reads a batch of rows equal to the total requested rows divided by the number of CPUs:

Rows read = Total requested rows / Number of CPUs

You can change the batch size per request, by specifying the cassandra.readBatchSize HTTP request parameter. Smaller batches use more parallelism, while larger batches use less.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com