Handling inconsistencies in query results

Consider session stickiness, subrange node repair, and follow best practices for soft commit points on different replica nodes.

DSE Search implements an efficient, highly available distributed search algorithm on top of Cassandra, which tries to select the minimum number of replica nodes required to cover all token ranges, and also avoid hot spots. Consequently, due to the eventually consistent nature of Cassandra, some replica nodes might not have received or might not have indexed the latest updates yet. This situation might cause DSE Search to return inconsistent results (different numFound counts) between queries due to different replica node selections. This behavior is intrinsic to how highly available distributed systems work, as described in the ACM article, "Eventually Consistent" by Werner Vogels. Most of the time, eventual consistency is not an issue, yet DSE Search implements session stickiness to guarantee that consecutive queries will hit the same set of nodes on a healthy, stable cluster, to provide monotonic results. Session stickiness works by adding a session seed to request parameters as follows:

shard.shuffling.strategy=SEED
shard.shuffling.seed=session_id

In the event of unstable clusters with missed updates due to failures or network partitions, consistent results can be achieved by repairing nodes using the subrange repair method.

Finally, another minor source of inconsistencies is caused by different soft commit points on different replica nodes: A given item might be indexed and committed on a given node, but not yet on its replica. This situation is primarily a function of the load on each node. Implement the following best practices:

  • Evenly balance read/write load between nodes
  • Properly tune soft commit time and async indexing concurrency
  • Configure back pressure in the dse.yaml file

For information about multi-threaded asynchronous indexing that uses a back pressure mechanism, see Configuring and tuning indexing performance.

To maximize insert throughput, DSE Search buffers insert requests from Cassandra so that application insert requests can be acknowledged as quickly as possible. However, if too many requests accumulate in the buffer (a configurable setting), DSE Search pauses or blocks incoming requests until DSE Search catches up with the buffered requests. In extreme cases, that pause causes a timeout to the application.