Request throttling
Quick overview
Limit session throughput.
-
advanced.throttler
in the configuration; defaults to pass-through (no throttling), also available: concurrency-based (max simultaneous requests), rate-based (max requests per time unit), or write your own. - metrics:
throttling.delay
,throttling.queue-size
,throttling.errors
.
Throttling allows you to limit how many requests a session can execute concurrently. This is useful if you have multiple applications connecting to the same Cassandra cluster, and want to enforce some kind of SLA to ensure fair resource allocation.
The request throttler tracks the level of utilization of the session, and lets requests proceed as long as it is under a predefined threshold. When that threshold is exceeded, requests are enqueued and will be allowed to proceed when utilization goes back to normal.
From a user’s perspective, this process is mostly transparent: any time spent in the queue is
included in the session.execute()
or session.executeAsync()
call. Similarly, the request timeout
encompasses throttling: it starts ticking before the request is passed to the throttler; in other
words, a request may time out while it is still in the throttler’s queue, before the driver has even
tried to send it to a node.
The only visible effect is that a request may fail with a RequestThrottlingException, if the throttler has determined that it can neither allow the request to proceed now, nor enqueue it; this indicates that your session is overloaded. How you react to that is specific to your application; typically, you could display an error asking the end user to retry later.
Note that the following requests are also affected by throttling:
- preparing a statement (either directly, or indirectly when the driver reprepares on other nodes, or when a node comes back up – see how the driver prepares);
- fetching the next page of a result set (which happens in the background when you iterate the
synchronous variant
ResultSet
). - fetching a query trace.
Configuration
Request throttling is parameterized in the configuration under
advanced.throttler
. There are various implementations, detailed in the following sections:
Pass through
datastax-java-driver {
advanced.throttler {
class = PassThroughRequestThrottler
}
}
This is a no-op implementation: requests are simply allowed to proceed all the time, never enqueued.
Note that you will still hit a limit if all your connections run out of stream ids. In that case,
requests will fail with an AllNodesFailedException, with the getErrors()
method returning a
BusyConnectionException for each node. See the connection pooling page.
Concurrency-based
datastax-java-driver {
advanced.throttler {
class = ConcurrencyLimitingRequestThrottler
# Note: the values below are for illustration purposes only, not prescriptive
max-concurrent-requests = 10000
max-queue-size = 100000
}
}
This implementation limits the number of requests that are allowed to execute simultaneously. Additional requests get enqueued up to the configured limit. Every time an active request completes (either by succeeding, failing or timing out), the oldest enqueued request is allowed to proceed.
Make sure you pick a threshold that is consistent with your pooling settings; the driver should
never run out of stream ids before reaching the maximum concurrency, otherwise requests will fail
with BusyConnectionException instead of being throttled. The total number of stream ids is a
function of the number of connected nodes and the connection.pool.*.size
and
connection.max-requests-per-connection
configuration options. Keep in mind that aggressive
speculative executions and timeout options can inflate stream id consumption, so keep a safety
margin. One good way to get this right is to track the pool.available-streams
metric
on every node, and make sure it never reaches 0. See the connection pooling page.
Rate-based
datastax-java-driver {
advanced.throttler {
class = RateLimitingRequestThrottler
# Note: the values below are for illustration purposes only, not prescriptive
max-requests-per-second = 5000
max-queue-size = 50000
drain-interval = 1 millisecond
}
}
This implementation tracks the rate at which requests start, and enqueues when it exceeds the configured threshold.
With this approach, we can’t dequeue when requests complete, because having less active requests
does not necessarily mean that the rate is back to normal. So instead the throttler re-checks the
rate periodically and dequeues when possible, this is controlled by the drain-interval
option.
Picking the right interval is a matter of balance: too low might consume too many resources and only
dequeue a few requests at a time, but too high will delay your requests too much; start with a few
milliseconds and use the cql-requests
metric to check the impact on your latencies.
Like with the concurrency-based throttler, you should make sure that your target rate is in line with the pooling options; see the recommendations in the previous section.
Monitoring
Enable the following metrics to monitor how the throttler is performing:
datastax-java-driver {
advanced.metrics.session.enabled = [
# How long requests are being throttled (exposed as a Timer).
#
# This is the time between the start of the session.execute() call, and the moment when the
# throttler allows the request to proceed.
throttling.delay,
# The size of the throttling queue (exposed as a Gauge<Integer>).
#
# This is the number of requests that the throttler is currently delaying in order to
# preserve its SLA. This metric only works with the built-in concurrency- and rate-based
# throttlers; in other cases, it will always be 0.
throttling.queue-size,
# The number of times a request was rejected with a RequestThrottlingException (exposed as a
# Counter)
throttling.errors,
]
}
If you enable throttling.delay
, make sure to also check the associated extra options to correctly
size the underlying histograms (metrics.session.throttling.delay.*
).