Speculative query execution
Speculative queries are used to preemptively start a second execution of a query against another node, before the first node has replied or returned an error. Sometimes a node may be slow to respond. Queries sent to that node will experience increased latency.
To mitigate this situation, use speculative execution to preemptively start a second execution of the query against another node, before the first node has replied or returned an error. If the second node replies faster, that response is returned to the client and the first execution is cancelled. If the first execution is cancelled, the driver ignores the response, but the request still interacts with the server.
The first node might reply just after the second execution call was started. In this case, the second preemptive execution is cancelled. For applications that use speculative execution, the data from whichever node replies faster is returned to the client.
The goal of speculative execution is to improve the overall latency of an application. However, too many speculative executions increase the load on the cluster. If speculative executions are used to avoid sending queries to unhealthy nodes, a healthy node should rarely reach resource limits. Drivers provide a configurable delay threshold at which speculative executions will be sent. In order to determine an appropriate threshold for your application, benchmark the healthy platform state (all nodes are up and under a normal load), monitoring the response latencies. Based on these results, use the latency at a high percentile (p99.9) as the speculative executions threshold.
Alternatively, when low latency is the highest priority and the cluster can handle the increased throughput, set the threshold to 0, which effectively always performs speculative executions.
Most drivers surface metrics for speculative executions that can be used to observe the speculative executions frequency.
If a query is not idempotent, the driver will never schedule speculative executions for the query, because there is no way to guarantee that only one coordinator will apply the mutation.
Turning on speculative executions doesn’t change the driver’s retry behavior. Each parallel execution triggers retries independently. The only impact is that all executions of the same query always share the same query plan, so each node will be used by no more than one execution.
One effect of speculative executions is that many requests get cancelled, which can lead to a phenomenon called stream ID exhaustion. Each TCP connection can handle multiple simultaneous requests, identified by a unique number called a stream ID. See Connection pooling. When a request gets cancelled, the stream ID for that request cannot be used immediately because the response for that request may be returned later from the server once it fulfills the request. If this happens often, the number of available stream IDs diminishes over time. When the available stream IDs goes below a given threshold, the connection is closed and a new connection is created. If requests are often cancelled, connections will be recycled at a high rate.
In practice, exhausting all stream IDs on a connection should not occur. Each connection can reference 32768 stream IDs. Most driver implementations are configured by default to only send 1000 requests per connection.
Most drivers provide a metric for observing the number of inflight requests for a node. Some drivers provide a metric for observing the number of orphaned stream IDs. Monitor these metrics to ensure that stream ID exhaustion is not occurring.
If stream ID exhaustion is occurring, the typical solution is to add more capacity to the cluster, or adjust the system settings of the nodes to avoid resource limits.
Ordering issues are only a problem when using server-side timestamps. All recent versions of the DataStax drivers use client-side timestamps with exception of the C++, PHP, and Ruby drivers. Unless the driver is explicitly configured to use server-side timestamps, this section does not apply. See Query timestamps for details.
For example, the following query is run with speculative execution and server-side timestamps enabled.
INSERT INTO my_table (k, v) VALUES (1, 1);
When the first execution is slow, a second execution is triggered. Finally, the first execution completes, so the second execution is cancelled. However, cancelling an execution only means that the driver stops waiting for the server’s response. The request could still be active.
Suppose that while the second request is still active after the driver canceled the execution, the following query is run and completes successfully.
DELETE from my_table where k = 1;
The second request of the
INSERT query finally reaches its target node, which applies the
The row that was successfully deleted is back, despite the driver canceling the second request.
To avoid this scenario, use client-side timestamps.