Search index filtering best practices

Best practices for DSE Search queries.

DataStax recommends following these best practices for running queries in DSE Search:
  • Use CQL to run search queries.

    Perform all data manipulation with CQL, except for deleting by query.

  • Use the simplest and best fit Solr types to fulfill the required type for your query. See Defining index field types.
  • For improved performance, use Solr filter query (fq) parameters instead of q parameters whenever possible. The results from filter queries are stored in a cache. You can reduce the average response time from seconds to milliseconds. The following example queries the cyclist first name and last name:
    '{"q":"*:*", "fq":"firstname:Alex AND lastname:FRAME"}'
    Each fq name and value string pair can be a member of an fq array. Fq name and value pairs are treated as if they are separated by AND. For example:
    '{"q":"*:*", "fq":["lastname:BELKOV", "nationality:Russia"]}'
    Adjust your queries so that the results fit into the memory cache.
  • Use profiles when creating a search index.
  • Avoid querying nodes that are indexing.

    For responding to queries, DSE Search ranks the nodes that are not performing search indexing higher than indexing ones. If nodes that are indexing are the only nodes that can satisfy the query, the query does not fail but can return only partial results.

  • When vnodes are not used, distributed queries in DSE Search are most efficient when the number of nodes in the queried data center (DC) is a multiple of the replication factor (RF) in that DC.
  • Avoid using too many terms in the query, like:
    SELECT request_id, store_id
    FROM store_search.transaction_search
    WHERE solr_query = '{"q":"*:*","shards.failover":true,"shards.tolerant":false,
      "fq":"store_id:store1a store_id:store2b store_id:store2c ... store_id:store19987d"}';
    Instead, use a terms filter query.
  • When writing collections with few collection updates, DataStax recommends frozen collections over non-frozen collections to address query latency.
    For example, a simple frozen set of text elements:
    CREATE TABLE foo ( 
      id text, values frozen<set<text>>, PRIMARY KEY (id)
    );
    
    CREATE TYPE name (
      first text, last text
    );
    A frozen list of UDTs:
    CREATE TABLE tableWithList (
      id text, names frozen<list<frozen<name>>>, PRIMARY KEY (id)
    );