Query idempotence

A CQL query is idempotent if it can be applied multiple times without changing the result of the initial application.

A CQL query is idempotent if it can be applied multiple times without changing the result of the initial application.

UPDATE my_table SET list_col = [1] WHERE pk = 1 

This query is idempotent because no matter how many times it is executed, list_col will always end up with the value [1].

UPDATE my_table SET list_col = [1] + list_col WHERE pk = 1 

This query is not idempotent because if list_col was initially empty, it will contain [1] after the first execution, [1, 1] after the second.

By default, all DataStax Drivers consider queries to be non-idempotent. It is the user’s responsibility to mark queries as idempotent to leverage features such as retry and speculative execution.

C/C++ C# Java Node.js (see isIdempotent) PHP (not supported) Python Ruby

Non-idempotent examples

Queries that insert the result of a non-deterministic functional call (for example now() and uuid()) are not idempotent.

The following query is not idempotent because now() produces a value based on the current time on the DSE coordinator responsible for handling the request, so successive invocations will produce different values if done at different times.

UPDATE my_table SET v = now() WHERE pk = 1

Counter updates are not idempotent. Each application of a counter update changes the accumulated value of the counter.

UPDATE my_table SET counter_value = counter_value + 1 WHERE pk = 1;

Prepend, append, or deletion operations on lists are not idempotent.

Prepend and append add elements to the list on each invocation. Delete removes values at a position which may vary depending on the state of the list when the query is invoked. Note that update, insert, and delete operations on set, map, tuples and user defined types are idempotent.

UPDATE my_table SET list_col = [1] + list_col WHERE pk = 1

Lightweight transactions should be considered non-idempotent if linearizability is a concern. For example:

UPDATE my_table SET v = 4 WHERE k = 1 IF v = 1

If this statement is executed twice, the IF condition will fail on the second execution. In this case, the second execution will do nothing and v will still have the value 4. The problem appears when multiple clients execute the query with retries enabled.

  1. v has the value 1.
  2. Client 1 executes the query above, performing a compare and set operation from 1 to 4.
  3. Client 1’s connection drops, but the query completes successfully. v now has the value 4.
  4. Client 2 executes a compare and set operation on v from 4 to 2.
  5. Client 2’s transaction succeeds. v now has the value 2.
  6. Since Client 1 lost its connection, it considers the query as failed, and transparently retries the compare and set operation on v from 1 to 4. Because v now has a value of 2, it receives a “not applied” response.

One important aspect of lightweight transactions is linearizability. Given a set of concurrent operations on a column from different clients, there must be a way to reorder them to yield a sequential history that is correct. In the above example, from the client’s point of view there were two operations.

  • Client 1 executed a compare and set operation on v from 1 to 4 that was not applied.
  • Client 2 executed a compare and set operation on v from 4 to 2 that was applied.

Overall the column changed from 1 to 2. There is no ordering of the two operations that can explain the change, and linearizability was broken by the transparent retry at step 6.