DataStax driver retry policies
In a distributed system environment, sending a request, such as a CQL statement, to an external component can fail on the client side, in transit, on the server side, or by hitting a timeout.
Retry policies determine whether a driver should retry a failed statement and how the driver handles the retry.
Retry policy triggers
By design, DataStax drivers don’t retry all statements or all failures.
Safe retries
DataStax drivers automatically retry requests that can be safely retried, meaning that repeating the request won’t cause unexpected changes to the database or application state.
Generally, DataStax drivers consider a request safe to retry if the request is idempotent and the request failed due to an error type that is eligible for retry.
If a request is safe to retry, the driver follows the behavior defined in the retry policy to reattempt the request. If a request is unsafe to retry, DataStax drivers typically ignore the retry policy, and then return an error to the application. For more information about driver retry policies and handling of specific error types, see Error types and Configure retry policies.
Idempotent statements
A request is idempotent if executing it multiple times leaves the database in the same state as executing it only once. In contrast, non-idempotent statements can make unexpected changes to the database when retried.
Idempotent requests are considered safe to retry. If you have marked a statement as idempotent, then your driver can automatically retry the request in the event of a failure.
With few exceptions, DataStax drivers can’t infer idempotency because drivers don’t parse CQL query strings. You must explicitly mark statements as idempotent in your application code. Otherwise, the driver treats the statements as non-idempotent. The Java driver’s query builder is one of the few exceptions where the driver can infer idempotency. |
DataStax recommends designing your data model to support idempotent statements for the following reasons:
-
Drivers can safely retry idempotent statements.
-
By design, drivers don’t always retry non-idempotent statements. Failed executions of non-idempotent statements can cause timeouts and network disruptions that decrease your application’s resilience.
However, some error types don’t require idempotency to trigger the retry policy. For specific conditions, see your driver’s documentation.
-
Drivers run speculative executions on idempotent queries only.
Error types
The following error types can be eligible for automatic retry:
- Client errors
-
The default retry policy can retry the request if it was never sent over the network. However, not all client errors are eligible subjects for retry policies.
- Server read timeouts
-
In the event of a read timeout, the default retry policy retries the request if the number of replicas that reply is greater than or equal to the number of required responses per the consistency level. Otherwise, it throws an error.
- Server write timeouts
-
In the event of a write timeout, the default retry policy retries the request if the request is a logged batch request and fails to write to the batch log. Otherwise, it throws an error.
- Server unavailable errors
-
In the event of an unavailable exception, the default retry policy retries the request using the next host in the load balancing policy.
Available retry policies
DataStax drivers offer default retry policies and support for extended retry policies.
- Default retry policy
-
DataStax recommends the default retry policy for for most applications.
The default retry policy retries requests that are safe to retry while preserving the consistency level of the original request. For specific error handling under the default retry policy, see Error types.
- Fall-through retry policy
-
This policy is not supported by all drivers.
The fall-through retry policy never retries or ignores a failed request. In all cases, the fall-through retry policy returns an error.
Use this policy for applications that need to implement their own business logic to handle retrying a request. For more information about custom retry logic, see Manual retries and custom retry logic and your driver’s documentation.
- Logging retry policy
-
This policy is not supported by all drivers.
Use the logging retry policy as a parent policy for another retry policy implementation.
This policy only logs the retry decision made by its child policy. The child policy can retry a failed request, but this behavior is not dependent on the logging retry policy itself.
Typically, this policy is used to debug driver retry behavior.
- Downgrading consistency retry policy (advanced)
-
This policy is not supported by all drivers.
DataStax does not recommend this policy for most applications.
If you use this policy, limit it to specific use cases, such as some disaster recovery scenarios. Make sure you completely understand the impact on data consistency that this policy can cause before you use it.
Configure retry policies
C/C++ driver retry policies
C# driver retry policies
GoCQL driver retry policies
Java driver retry policies
While the Java driver’s DefaultRetryPolicy
provides conservative retry behavior that is suitable for most applications, the Java driver also provides an extendable RetryPolicy
and a ConsistencyDowngradingRetryPolicy
.
However, DataStax recommends avoiding the ConsistencyDowngradingRetryPolicy
for most use cases.
In general, the Java driver will retry CQL statements that fail onWriteTimeoutVerdict
, onRequestAbortedVerdict
, or onErrorResponseVerdict
only if the statements are marked idempotent.
You can mark individual statements as idempotent, or use the global idempotence flag:
SimpleStatement statement =
SimpleStatement.newInstance("UPDATE user SET name = 'Joe' WHERE id = 1")
.setIdempotent(true);
datastax-java-driver {
basic.request {
default-idempotence = true
}
}
onReadTimeoutVerdict
and onUnavailableVerdict
do not require idempotence.
For more information, including a detailed explanation of retry logic, interface callbacks, and node selection behavior, see Java driver retries.
Node.js driver retry policies
For the Node.js driver, use the retry
module.
PHP driver retry policies
See PHP driver retry policies and DefaultPolicy
.
Python driver retry policies
DataStax recommends using the default retry policy setting, which is typically a RetryPolicy
object.
However, you can also extend RetryPolicy
.
For more information, including retry policy classes, methods, and attributes, see Retrying failed operations.
DowngradingConsistencyRetryPolicy
is deprecated.
Ruby driver retry policies
Manual retries and custom retry logic
You can implement manual retries or custom retry logic in your application code to handle scenarios not covered by the default retry policy.
The following sections explain some best practices related to manual retries and custom retry logic.
Use the fall-through retry policy to bypass automatic retries
If you need to implement custom retry logic for all failures, you can use the fall-through retry policy to pass an error on all failures.
Use the logging retry policy to debug retry behavior
If you need to debug your application’s retry behavior, you can use the logging retry policy to log the retry decisions made by the driver.
Determine potential data mutation
In addition to statement execution logs and request tracing, you can use the exception types returned by the driver to determine whether a given CQL statement was executed and the likelihood of data mutation.
This approach provides more granular exception handling for applications without sufficient global exception handling.
Example: Exception handling with the Java driver
try {
session.execute(...);
} catch (OverloadedException | UnavailableException e) {
// data mutation did not happen
} catch (WriteTimeout | DriverTimeoutException e) {
// handle write or client timeout
// data might have been changed
}
Example: Exception handling with the Python driver
try:
session.execute(...)
except Unavailable:
# data mutation didn't happen
...
except RequestExecutionException as e:
if hasattr(e, 'summary'):
if e.summary == 'Coordinator node overloaded':
# data mutation didn't happen
...
else:
raise e
except WriteTimeout or OperationTimedOut:
# data might have been changed
...
Mitigate risks of unsafe and non-idempotent retries
Generally, avoid retrying requests that are considered unsafe or non-idempotent. If you must retry such requests, be aware of the risks, and make sure that your application includes logic to handle potential adverse side effects. |
If you must retry a request that is potentially unsafe to retry, do the following:
-
Determine if there is a justification for retrying the request.
Consider how critical the request is to your application and whether there is a cost associated with a failed operation. For example, a change that writes time-sensitive, business-critical data might be worth retrying with custom logic and failsafes.
Non-idempotent retry scenarios
The following examples describe scenarios where you might want to retry a potentially unsafe (non-idempotent) request:
-
High degree of certainty: Based on the error type or your application’s architecture, you are confident that the request didn’t reach the node or get applied to the node. You have a high degree of certainty that the request wasn’t executed at all (or in any substantial way) and that retrying the request won’t cause unintended side effects.
-
Business requirements: The cost of a failed operation is high, and your application or database can tolerate or correct potential duplicates. In this scenario, it’s best to have additional safeguards, such as unique tokens or external idempotency keys, to detect and mitigate duplications or undesired changes.
-
Specific failure conditions: Certain error types are less risky to retry than others, such as certain transient errors or custom application-specific errors.
-
-
Mitigate the risks of retrying the request by designing your application to handle potential side effects, such as duplicate writes or unintended changes to the database.
-
Implement manual retries or custom retry logic in your application code to reissue the request.
How you detect and handle unsafe retries depends on your application’s requirements and the specific error conditions you encounter. For example, because DataStax drivers return an error if a request is unsafe to retry, you could use exception handling based on these errors to implement your own retry logic in your application code. For specific details about errors returned by unsafe retries, see your driver’s documentation.
Exception handling for LWTs
When executing lightweight transactions (LWTs), Cassandra-based clusters use a variant of the Paxos consensus protocol to reach consensus among replicas.
However, LWTs can take longer than regular writes, and timeout exceptions are common. Timeouts can occur even when a write operation was successfully executed on all nodes.
If you want to manually retry LWTs that failed due to a timeout, DataStax recommends the following:
-
Don’t assume the meaning of
applied=false
. An LWT can return a falseapplied
flag if the operation failed or the desired value is already present. Make sure your application verifies the actual value present in the row before reacting. -
To minimize Paxos split-votes (where replicas fail to reach consensus about consistency), introduce random sleep time between retries. As little as 1-10 milliseconds can be beneficial.
Example: Random sleep between LWT retries with the Java driver
private boolean retryLwt() throws Exception { int maxRetries = 5; int retryCount = 0; Exception error = null; while (retryCount < maxRetries) { try { ResultSet result = session.execute("UPDATE users SET password = 'new' WHERE user = 'cassandra' IF password = 'old'"); boolean applied = result.wasApplied(); Row row = result.one(); if (applied || row != null && "new".equals(row.getString("password"))) { return true; // update was applied } return false; // update was not applied } catch (Exception e) { error = e; randomSleep(); // random sleep time to prevent resource contention ++retryCount; } } throw error; // retries exceeded }
-
Be mindful of linearizability in LWTs Lightweight transactions should be considered non-idempotent if linearizability is a concern. For more information, see Query idempotence in DataStax drivers.
Throttling versus retries
Retries aren’t appropriate for all errors. In some cases, it’s better to address the issue earlier in the process, before the request is issued.
For example, if a request fails due to rate limiting, retrying the request immediately can exacerbate the problem by resetting the rate limit timer. Even with sleep between retries, the request can still trigger rate limiting if it passes a large volume of data or queries.
For rate limiting, it could be more effective to use client-side throttling to prevent the driver from issuing too many concurrent requests.
For example, the Java driver offers request throttling through advanced.throttler
.
For more information, see your driver’s documentation.