Asynchronous query execution with Cassandra drivers

Most DataStax-compatible drivers execute queries synchronously by default. However, asynchronous query execution can be more performant when processing many queries or long-running queries.

With asynchronous query execution, Cassandra drivers can send multiple, concurrent requests on a single connection without blocking threads, allowing other operations to proceed while waiting for the query results.

Here’s what happens during asynchronous query execution:

Multiple queries are issued concurrently. Application threads that issue requests aren’t blocked while waiting for the response.
The asynchronous query execution call immediately returns a future object, which is a placeholder object that stands in for the result until the result is returned from the database.
The server concurrently processes the requests. Responses are sent back to the driver without strict ordering.
The application can use the future object to obtain query results and errors, if they occur.

If this strategy is appropriate for your application, it can optimize query processing, improve the driver’s ability to coalesce query requests, and maximize use of server-side resources.

Configure asynchronous query execution

Compared to synchronous queries, asynchronous queries result in more complex application logic, but they can be much more performant. For information about asynchronous query execution with Cassandra drivers, see your driver’s documentation:

Concurrency limits

On the client side, drivers limit the amount of in-flights requests (simultaneous, incomplete requests) that can be submitted over a single connection to a node. The default value varies by driver, ranging from 1024 to 2048 requests per connection:

datastax-java-driver.advanced.connection {
  max-requests-per-connection = 1024    // Concurrency limit per connection
  pool {
    local.size = 1
    remote.size = 1
  }
}

Calculate cumulative concurrent requests

You can use the following formula to calculate the total number of concurrent requests for all connections for a driver session:

Cumulative maximum concurrent requests = (requests per connection) * (connections to local and remote nodes) * (number of nodes)

On the server side, the binary protocol (Cassandra Native Protocol) allows no more than 32,768 concurrent requests per connection.

Upon reaching the concurrency limit, the driver immediately throws an exception, such as BusyConnectionException, that indicates the cluster connections are busy.

Don’t change the concurrent request limit in your driver’s configuration.

A server node usually takes less than a millisecond to fulfil a request. Therefore, exceeding the driver-side limit means trying to support millions of requests per second with too few server nodes.

Similarly, if you want to lower the driver’s global throughput limit, DataStax recommends that you use a throttler.

Manage concurrency in Cassandra drivers

Use the following strategies to optimize concurrency and avoid hitting concurrency limits in your applications.

Provision your cluster appropriately

Plan and provision your Apache Cassandra®, Hyper-Converged Database (HCD), and DataStax Enterprise (DSE) deployments to support the maximum number of parallel requests required for the desired latency of an application. For a given deployment, introducing more load to the system above a minimum threshold increases overall latency.

For Astra DB databases, consider provisioned capacity units (Serverless databases) or higher workload tiers (Classic databases).

Limit simultaneous requests in your application code

When submitting several requests in parallel, the requests are queued at one of three levels: on the driver side, on the network stack, or on the server side. Excessive queueing on any of these levels affects the total time it takes each operation to complete.

To reduce queuing, increase throughput, and reduce latency, you can adjust the concurrency level (the maximum number of simultaneous requests).

The optimal concurrency level depends on both the client and server hardware specifications as well as other factors like:

Server cluster size
Number of instances of the application accessing the database
Complexity of the queries

To ensures that your application’s asynchronous operations do not exceed the concurrency level, you can use the following strategies:

Launch a fixed number of asynchronous operations using the concurrency level as the maximum. As each operation completes, add a new one.
Apply application-level micro-batching, as shown in the fan-out pattern examples.

The following code examples show how to launch asynchronous operations in a loop and control the concurrency level:

Use specialized tools for bulk operations in custom applications

Unbounded concurrency issues often arise when performing bulk operations in custom code. Avoid them by using the appropriate tool for the task. For example:

If you are importing data from other sources, consider dsbulk.
If you are performing transformations from external data sources, consider Apache Spark.

Increase the number of allowed TCP connections per node

DataStax-compatible drivers can submit multiple concurrent CQL requests over a single connection to a Cassandra node. Typically, you don’t need to increase the number of allowed connections per node.

However, if your application executes large amounts of CQL statements asynchronously, you might need to increase the number of allowed TCP connections per node.

For more information, see Use a fan-out pattern to update rows in different partitions and the DataStax Support article on TCP saturation.

Use a fan-out pattern to update rows in different partitions

Apply a fan-out pattern when an application needs to update multiple rows in different partitions.

To do this, your application must asynchronously execute multiple queries, collect future objects, and then wait for all requests to complete.

Asynchronous query execution in a fan-out pattern

Example: Fan-out pattern with the Java driver

....
List<String> ids = ...
int batchSize = 500;
List<CompletableFuture<AsyncResultSet>> futures = new ArrayList<>(batchSize);
for (List<String> batch : Lists.partition(ids, batchSize)) {
   for (String id : batch) {
       CompletableFuture<AsyncResultSet> future = session.executeAsync("SELECT ...").toCompletableFuture();
       futures.add(future);
   }

   CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
   for (CompletableFuture<AsyncResultSet> future : futures) {
       AsyncResultSet resultSet = future.join();
       // process...
   }
   futures.clear();
}
....

Example: Fan-out pattern with the Python driver

....
    prepared_stmt = session.prepare('INSERT INTO mytable (key, col1) VALUES (?, ?)')
    ids = [i for i in range(100)]
    sem = threading.Semaphore(30) # limit the number of concurrent requests
    for i in ids:
        sem.acquire()
        session.execute_async(prepared_stmt, [i, 'value']).add_callbacks(lambda _: sem.release(), lambda _: sem.release())

    # wait until all requests are done
    for i in range(30):
        sem.acquire()
....

Use unlogged batch statements when querying rows in the same partition

If your application reads or writes multiple rows from the same partition, you can use an unlogged batch CQL statement to optimize performance. While logged batches guarantee some level of atomicity (batch sent to two nodes), they exhibit worse performance in this case.

Example: Unlogged batch statement with the Java driver

BatchStatement batch =
    BatchStatement.newInstance(
        DefaultBatchType.UNLOGGED,
        preparedStatement.bind("Company A", "Employee 1"),
        preparedStatement.bind("Company A", "Employee 2"),
        preparedStatement.bind("Company A", "Employee 3")
    );
session.execute(batch);

Example: Unlogged batch statement with the Python driver

from cassandra.cqlengine.query import BatchType
with BatchQuery(batch_type=BatchType.Unlogged) as b:
    LogEntry.batch(b).create(k=1, v=1)
    LogEntry.batch(b).create(k=1, v=2)