Resolving query timeouts on restarted nodes

Steps to fix query timeouts when restarting nodes with large search indexes.

When restarting nodes with large indexes (hundreds of megabytes), initial queries might timeout due to the time it takes to build the token range filter queries.

dse.yaml

The location of the dse.yaml file depends on the type of installation:
Package installations /etc/dse/dse.yaml
Tarball installations installation_location/resources/dse/conf/dse.yaml

Procedure

To workaround timeouts:
  1. Run with a replication factor greater than 1 to ensure that replicas are always available.
  2. Configure the dse.yaml settings for enable_health_based_routing and uptime_ramp_up_period_seconds to be larger than the amount of time it takes for the first query to answer. 1 hour is usually enough.
  3. After restarting the node, issue several match all queries. For example, q=*:* to warm up the filters.
  4. Optional: If you're using the Java Driver, create an ad-hoc session with only the node to warm up in the white list.
    Issuing many queries increase the chances that all token ranges are used.

Results

After the uptime ramp-up period, the node starts to be hit by distributed queries. The filters are warmed up already and timeouts should not occur.