Search index filtering best practices

Best practices for DSE Search queries.

DataStax recommends following these best practices for running queries in DSE Search:
  • Use CQL to run search queries.

    Perform all data manipulation with CQL, except for deleting by query.

  • Use the simplest and best fit Solr types to fulfill the required type for your query. See Understanding schema field types.
  • For improved performance, use Solr filter query (fq) parameters instead of q parameters whenever possible. The results from filter queries are stored in a cache. You can reduce the average response time from seconds to milliseconds. The following example queries the cyclist first name and last name:
    '{"q":"*:*", "fq":"firstname:Alex AND lastname:FRAME"}'
    Each fq name and value string pair can be a member of an fq array. The fq name and value pairs are treated as if they are separated by AND. For example:
    '{"q":"*:*", "fq":["lastname:BELKOV", "nationality:Russia"]}'
    Adjust your queries so that the results fit into the memory cache.
  • Use profiles when creating a search index.
  • Avoid querying nodes that are indexing.

    For responding to queries, DSE Search ranks the nodes that are not performing search indexing higher than nodes that are indexing. If nodes that are indexing are the only nodes that can satisfy the query, the query does not fail but can return only partial results.

  • For optimal CQL single-pass queries, including queries where solr_query is used with a partition restriction, and queries with partition restrictions and a search predicate, ensure that the columns to SELECT are not indexed in the search index schema.
    Auto-generation indexes all columns by default. You can ensure that the field is not indexed but still returned in a single-pass query. For example, this statement indexes everything except for column c3, and informs the search index schema about column c3 for efficient and correct single-pass queries.
    WITH COLUMNS * { indexed : true }, c3 { indexed : false }; 
  • When vnodes are not configured in a cluster, distributed queries in DSE Search are most efficient when the number of nodes in the queried data center (DC) is a multiple of the replication factor (RF) in that DC.
  • Avoid using too many terms in the query, like:
    SELECT request_id, store_id
    FROM store_search.transaction_search
    WHERE solr_query='{"q":"*:*","shards.failover":true,
      store_id:store2b store_id:store2c ... store_id:store19987d"}'
    Instead, use a terms filter query.
  • When writing collections with few collection updates, DataStax recommends frozen collections over non-frozen collections to address query latency.
    For example, a simple frozen set of text elements:
    CREATE TABLE foo (
      id text, values frozen<set<text>>, PRIMARY KEY (id)
    CREATE TYPE name (
      first text, last text
    A frozen list of UDTs:
    CREATE TABLE tableWithList (
      id text, names frozen<list<frozen<name>>>, PRIMARY KEY (id)
  • JSON query limitations
    Failover and tolerance of partial results cannot coexist in the same JSON query. Queries support enabling tolerance for only one parameter.
    Note: The shards.tolerant parameter is not supported when deep paging is on.