Query idempotence in Cassandra drivers

A CQL query is idempotent if it can be applied multiple times without changing the result of the initial application.

Example: Idempotent and non-idempotent queries

For example, the following query is idempotent because no matter how many times it is executed, list_col will always end up with the value [1]:

UPDATE my_table SET list_col = [1] WHERE pk = 1

In contrast, the following query is not idempotent because, if list_col was initially empty, it will contain [1] after the first execution, then [1, 1] after the second execution, and so on.

UPDATE my_table SET list_col = [1] + list_col WHERE pk = 1

Idempotence is important for the following reasons:

Drivers can safely retry idempotent queries in the event of a failure.
Drivers don’t retry non-idempotent queries in certain situations, which can lead to timeouts and network disruptions that decrease your application’s resilience.
Drivers run speculative executions on idempotent queries only.

With few exceptions, Cassandra drivers cannot infer idempotency because drivers don’t parse CQL query strings.

You must explicitly mark statements as idempotent in your application code. Otherwise, the driver treats the statements as non-idempotent.

The Java driver’s query builder Java driver’s query builder is one of the few exceptions where the driver can infer idempotency.

Set idempotence in DataStax drivers

By default, all DataStax drivers consider queries to be non-idempotent.

You must mark queries as idempotent if you want to leverage features such as retry policies and speculative execution.

For more information, see the following driver documentation on marking queries as idempotent:

DataStax C/C++ driver: Set is idempotent
DataStax C# driver: Query idempotence
Java driver:
- Apache Cassandra Java driver version 4.18 and later: Idempotence
- DataStax Java driver version 4.17 and earlier: Idempotence
DataStax Node.js driver: QueryOptions, isIdempotent
The DataStax PHP driver doesn’t support setting idempotence.
DataStax Python driver: is_idempotent

The Python driver only provides an idempotent flag for speculative executions.
DataStax Ruby driver: idempotent?

Non-idempotent queries

To be able to retry CQL statements consistently, avoid non-idempotent queries, such as the following:

Queries that insert the result of non-deterministic functions, such as now(), uuid(), and timeuuid(). For an example, see Current time values.
LWTs that must be executed in a specific order. For more information, see LWTs where linearizability matters.
Queries that update counters or lists.
Certain operations on sets, maps, and tuples.
Deleting all rows.

Instead, use statements that bind exact values generated by the application to ensure idempotent data structures. For example, you can use maps and sets instead of lists and counters, which are non-idempotent.

Current time values

Functions that produce values based on the current time aren’t idempotent.

For example, the following query is not idempotent because now() produces a value based on the current time on the coordinator responsible for handling the request. Therefore, each invocation of this statement can produce a different value.

UPDATE my_table SET v = now() WHERE pk = 1

Counters

Counter updates aren’t idempotent because each application of the counter update changes the accumulated value of the counter:

UPDATE my_table SET counter_value = counter_value + 1 WHERE pk = 1;

Lists

Prepend, append, and delete operations on lists aren’t idempotent because the element’s position in the list can change depending on the state of the list when the query is invoked.

Prepend and append operations add elements to the list on each invocation. Delete operations remove elements from the list at a position that can change depending on the state of the list when the query is invoked.

UPDATE my_table SET list_col = [1] + list_col WHERE pk = 1

Sets, maps, and tuples

Update, insert, and delete operations on sets, maps, and tuples can be idempotent, but not always. For example:

Appending maps with + is non-idempotent.
Appending or removing set elements in non-primary key updates within batches are potentially non-idempotent.
Individual field updates in tuples are potentially non-idempotent.

DELETE without WHERE

DELETE without WHERE (removing all rows) is non-idempotent because it depends on the number of rows that exist at the time the request is made.

LWTs where linearizability matters

Treat lightweight transactions (LWTs) as non-idempotent if linearizability is a concern.

For example, if the following statement is executed twice, the IF condition will fail on the second execution. The second execution does nothing, and v still has the value 4.

UPDATE my_table SET v = 4 WHERE k = 1 IF v = 1

The problem appears when multiple clients execute the query with retries enabled. For example:

v has the value 1.
Client 1 executes the preceding query, performing a compare and set operation from 1 to 4.
Client 1’s connection drops, but the query completes successfully. v now has the value 4.
Client 2 executes a compare and set operation on v from 4 to 2.
Client 2’s transaction succeeds. v now has the value 2.
Since Client 1 lost its connection, it considers the query as failed, and transparently retries the compare and set operation on v from 1 to 4. Because v now has a value of 2, it receives a not applied response.

Linearizability is an important aspect of lightweight transactions (LWTs). Given a set of concurrent operations on a column from different clients, there must be a way to reorder them to yield a sequential history that is correct. In the above example, from the client’s point of view there were two operations.

Client 1 executed a compare and set operation on v from 1 to 4 that was not applied.
Client 2 executed a compare and set operation on v from 4 to 2 that was applied.

Overall, the column changed from 1 to 2. There is no ordering of the two operations that can explain the change, and linearizability was broken by the transparent retry at step 6.