Handling inconsistencies in query results

Troubleshooting tips to handle inconsistencies in query results.

dse.yaml

The location of the dse.yaml file depends on the type of installation:
Package installations /etc/dse/dse.yaml
Tarball installations installation_location/resources/dse/conf/dse.yaml

The DataStax Enterprise Help Center also provides troubleshooting information.

To troubleshoot inconsistencies in query results, consider session stickiness, subrange node repair, and follow best practices for soft commit points on different replica nodes.

DSE Search implements an efficient, highly available distributed search algorithm on top of the database, which tries to select the minimum number of replica nodes required to cover all token ranges, and also avoid hot spots. Consequently, due to the eventually consistent nature of the database, some replica nodes might not have received or might not have indexed the latest updates yet. This situation might cause DSE Search to return inconsistent results (different numFound counts) between queries due to different replica node selections. This behavior is intrinsic to how highly available distributed systems work, as described in the ACM article, "Eventually Consistent" by Werner Vogels. Most of the time, eventual consistency is not an issue, yet DSE Search implements session stickiness to guarantee that consecutive queries will hit the same set of nodes on a healthy, stable cluster, to provide monotonic results. Session stickiness works by adding a session seed to request parameters as follows:

shard.shuffling.strategy=SEED
shard.shuffling.seed=session_id

In the event of unstable clusters with missed updates due to failures or network partitions, consistent results can be achieved by repairing nodes using the DSE OpsCenter Repair Service.

Finally, another minor source of inconsistencies is caused by different soft commit points on different replica nodes: A given item might be indexed and committed on a given node, but not yet on its replica. This situation is primarily a function of the load on each node. Implement the following best practices:

  • Evenly balance read/write load between nodes
  • Properly tune soft commit time and async indexing concurrency
  • Configure back pressure in the dse.yaml file

For information about multi-threaded asynchronous indexing that uses a back pressure mechanism, see Configuring and tuning indexing performance.

To maximize insert throughput, DSE Search buffers insert requests from the database so that application insert requests can be acknowledged as quickly as possible. However, if too many requests accumulate in the buffer (a configurable setting), DSE Search pauses or blocks incoming requests until DSE Search catches up with the buffered requests. In extreme cases, that pause causes a timeout to the application.