Search index filtering best practices

Best practices for DSE Search queries.

DataStax recommends following these best practices for running queries in DSE Search:
  • Use CQL to run search queries.

    Perform all data manipulation with CQL, except for deleting by query.

  • Use the simplest and best fit Solr types to fulfill the required type for your query. See Understanding schema field types.
  • Use profiles when creating a search index.
  • Avoid querying nodes that are indexing.

    For responding to queries, DSE Search ranks the nodes that are not performing search indexing higher than nodes that are indexing. If nodes that are indexing are the only nodes that can satisfy the query, the query does not fail but can return only partial results.

  • For optimal CQL single-pass queries, including queries where solr_query is used with a partition restriction, and queries with partition restrictions and a search predicate, ensure that the columns to SELECT are not indexed in the search index schema.
    Auto-generation indexes all columns by default. You can ensure that the field is not indexed but still returned in a single-pass query. For example, this statement indexes everything except for column c3, and informs the search index schema about column c3 for efficient and correct single-pass queries.
    CREATE SEARCH INDEX ON test_search.abc WITH COLUMNS * { indexed : true }, c3 { indexed : false }; 
  • When vnodes are not configured in a cluster, distributed queries in DSE Search are most efficient when the number of nodes in the queried data center (DC) is a multiple of the replication factor (RF) in that DC.
  • Avoid using too many terms in the query, like:
    SELECT request_id, store_id FROM store_search.transaction_search WHERE solr_query='{"q":"*:*","shards.failover":true,"shards.tolerant":false,"fq":"store_id:store1a store_id:store2b store_id:store2c ... store_id:store19987d"}'
    Instead, use a terms filter query.
  • When writing collections with few collection updates, DataStax recommends frozen collections over non-frozen collections to address query latency.
    For example, a simple frozen set of text elements:
    CREATE TABLE foo (id text, values frozen<set<text>>, PRIMARY KEY (id))
    CREATE TYPE name (first text, last text)
    A frozen list of UDTs:
    CREATE TABLE tableWithList (id text, names frozen<list<frozen<name>>>, PRIMARY KEY (id))
  • JSON query limitations
    Failover and tolerance of partial results cannot coexist in the same JSON query. Queries support enabling tolerance for only one parameter.
    Note: The shards.tolerant parameter is not supported when deep paging is on.