Managing concurrency in asynchronous query execution

The DataStax drivers support sending multiple concurrent requests on a single connection to improve overall query performance. This is also known as request pipelining. These requests are processed by the server concurrently and responses are sent back to the client driver without strict ordering, allowing improved overall performance when a single operation is slow but the rest of the operations can be processed without any delay. For example, a query that involves consolidating data from multiple partitions will be much slower than a query that only retrieves data from a single partition.

Apache Cassandra™ or DSE deployments should be planned and provisioned to support the maximum number of parallel requests required for the desired latency of an application. For a given deployment, introducing more load to the system above a minimum threshold increases overall latency.

On the client side, the driver limits the amount of in-flights requests (or simultaneous requests that have not completed yet), to between 1024 and 2048 per connection by default, depending on the driver language. Above that limit, the driver immediately throws an exception indicating that the connections to the cluster are busy. You may reach this limit as a result of handling incoming load to your application. If your application is hitting the limit of in-flight requests, then add additional capacity to your DSE cluster.

Increasing the limit of in-flight requests can be set using the driver configuration settings, but it is not recommended. A server node usually takes less than a millisecond to fulfil a request. Exceeding the driver side limit would mean trying to support millions of requests per second with a few server nodes.

Limiting simultaneous requests in your application code

When submitting several requests in parallel, the requests are queued at one of three levels: on the driver side, on the network stack, or on the server side. Excessive queueing on any of these levels affects the total time it takes each operation to complete. Adjust the concurrency level, or number of simultaneous requests, to reduce the amount of queuing and get high throughput and low latency.

The optimal concurrency level depends on both the client and server hardware specifications as well as other factors like:

  • the server cluster size

  • the number of instances of the application accessing the database

  • the complexity of the queries

When implementing an application, launch a fixed number of asynchronous operations using the concurrency level as the maximum. As each operation completes, add a new one. This ensures that your application’s asynchronous operations do not exceed the concurrency level.

The following code examples show how to launch asynchronous operations in a loop and control the concurrency level.

C/C++

C#

Java

Node.js

Python

Using specialized tools to avoid problems in custom applications

Unbounded concurrency issues often arise when performing bulk operations in custom code. Avoid them by using the appropriate tool for the task. If you are importing data from other sources, then use dsbulk. If you are performing transformations from external data sources, then use Apache Spark.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com