Resolving query timeouts on restarted nodes

Steps to fix query timeouts when restarting nodes with large search indexes.

When restarting nodes with large indexes (hundreds of megabytes), initial queries might timeout due to the time it takes to build the token range filter queries.

Procedure

To workaround timeouts:

  1. Run with replication factor of 1 to ensure that replicas are always available.
  2. Configure the dse.yaml settings for enable_health_based_routing and uptime_ramp_up_period_seconds to be larger than the amount of time it takes for the first query to answer. 1 hour is usually enough.
  3. After restarting the node, issue several match all queries. For example, q=*:* to warm up the filters.
  4. Optional: If you're using the Java Driver, create an ad-hoc session with only the node to warm up in the white list.
    Issuing many queries increase the chances that all token ranges are used.

Results

After the uptime rump-up period, the node starts to be hit by distributed queries. The filters are warmed up already and timeouts should not occur.