Search index filter syntax

DataStax Enterprise supports production-grade implementation of CQL Solr queries on a DSE Search node. Develop CQL applications supporting full-text search without using Solr-specific APIs.

dse.yaml

The location of the dse.yaml file depends on the type of installation:

Package installations
Installer-Services installations

/etc/dse/dse.yaml

Tarball installations
Installer-No Services installations

installation_location/resources/dse/conf/dse.yaml

DataStax Enterprise supports a production-grade implementation of CQL Solr queries in DSE Search. You can develop CQL-centric applications supporting full-text search without having to work with Apache Solr™-specific APIs. Only full text search queries are supported.

Restriction:
  • CQL Solr queries are defaulted to an equivalent of LIMIT 10.
  • Pagination is off by default. In dse.yaml, the cql_solr_query_paging option specifies when to use pagination (also called cursors).
  • Apache Solr and Apache Lucene limitations for field names apply to pagination.
  • Queries with smaller result sets will see increased performance with paging off.
  • Limitations and known Apache Solr issues apply to DSE Search queries. For example: incorrect SORT results for tokenized text fields.
  • Column aliases are not supported in solr_query queries.
  • All of the fields that are queried on DSE SearchAnalytics clusters must be defined in the search index schema definition. Fields that are not defined in the search index schema columns not defined are excluded in the results returned from Spark queries.

Search index query syntax

Execute queries against indexed columns using the solr_query option of the SELECT statement WHERE clause.

Synopsis

SELECT selectors
  FROM table
  WHERE solr_query = 'search_expression'
  [ LIMIT n ] ;
There are two types of search index expressions:
Note: Use the solr_query option to filter on the search index fields. For example:
SELECT *
FROM users
WHERE solr_query = 'irc:jdoe';
The search indexes cannot be directly queried. For example, this syntax fails:
SELECT *
FROM users
WHERE irc = 'jdoe';

Writing a basic index query

The CQL query expression uses the syntax supported by the Solr q parameter. In CQL, to use a single quotation mark in a string literal, you must escape it using a single quotation mark (so you'll need to double the single quotation marks). See CQL escaping characters. For example:
SELECT *
FROM keyspace.table
WHERE solr_query = 'name: cat name: dog -name:fish';
When you name specific columns, DSE Search retrieves only the specified columns and returns the columns as part of the resulting rows. DSE Search supports projections (SELECT a, b, c...) only, not functions, for the select expression. The following example retrieves only the name column:
SELECT name
FROM keyspace.table
WHERE solr_query = 'name:cat name:dog -name:fish';
Use the LIMIT clause to specify how many rows to return. The following example retrieves only 1 row:
SELECT *
FROM keyspace.table
WHERE solr_query = 'name:cat name:dog -name:fish'
LIMIT 1;
Use the count() function in CQL Solr queries to return the number of rows that satisfy the Solr query:
SELECT count(*)
FROM table
WHERE solr_query = '...';

Using count() in combination with LIMIT or facets results in an error.

All response queries of the drivers have a custom payload where the total number of documents found is returned. This number is keyed as DSESearch.numFound.

Writing advanced solr_query expressions

DSE Search supports filtering CQL queries using more advanced Solr searches with JSON-based expressions.

dse.yaml

The location of the dse.yaml file depends on the type of installation:

Package installations
Installer-Services installations

/etc/dse/dse.yaml

Tarball installations
Installer-No Services installations

installation_location/resources/dse/conf/dse.yaml

cassandra.yaml

The location of the cassandra.yaml file depends on the type of installation:

Package installations
Installer-Services installations

/etc/dse/cassandra/cassandra.yaml

Tarball installations
Installer-No Services installations

installation_location/resources/cassandra/conf/cassandra.yaml
DSE Search supports filtering CQL queries using more advanced Solr searches with JSON-based expressions.

On this page:

See also Overriding the default TimeZone (UTC) in search queries.

JSON query syntax

The JSON query expression syntax is a JSON string. The JSON-based query expression supports local parameters in addition to the following parameters:
{
  "q": query_expression (string), 
  "fq": filter_query_expression(s) (string_or_array_of_strings, ...),  
  "facet": facet_query_expression (object) 
  "sort": sort_expression (string), 
  "start": start_index(number),
  timeAllowed: search_time_limit_ms,
  "TZ": zoneID),   //  Any valid zone ID in java TimeZone class                            
  "paging": "driver" (string),
  "distrib.singlePass": true|false (boolean),
  "shards.failover": true|false (boolean),    // Default: true
  "shards.tolerant": true|false (boolean),    // Default: false
  "commit": true|false (boolean),
  "route.partition": partition_routing_expression (array_of_strings), 
  "route.range": range_routing_expression (array_of_strings), 
  "query.name": query_name (string),
}
For example:
SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"q":"ethnicity:Asian"}';
SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Mexi*", "sort":"id asc"}' LIMIT 3;
SELECT * FROM mykeyspace.mytable WHERE solr_query='{"q" : "{!edismax}quotes:yearning or kills"}';
Note: To use Apache Solr™ Extended DisMax Query Parser (eDisMax) with solr_query, you must include defaultSearchField in your schema.

Making distributed queries tolerant of shard failures

Since distributed queries contact many shards, making queries more tolerant of shard failures ensures more successful completions. Use shards.failover or shards.tolerant parameters to define query failover and tolerance of shard failures during JSON queries:
Valid configurations Description
"shards.failover": true,
"shards.tolerant": false,
This default configuration enables query failover and disables fault tolerance. Attempt to retry the failed shard requests when errors indicate that there is a reasonable chance of recovery. If any of the nodes (shards) that we scatter to fail before the query is complete, retry the shard query against a replica.
"shards.failover": false,
"shards.tolerant": true,
Disable query failover. Enable fault tolerance. Make the query succeed, even if the query only partially succeeded, and did not succeed for all nodes.
"shards.failover": false,
"shards.tolerant": false,
Disable query failover. Disable fault tolerance.
Note: Failover and tolerance of partial results cannot coexist in the same query. Queries support enabling tolerance for only one parameter.

Other fault tolerance configuration options include: netty_client_request_timeout in dse.yaml and read_request_timeout_in_ms in cassandra.yaml.

JSON queries with literal characters that are Apache Solr™/Apache Lucene® special characters

Lucene supports escaping special characters that are part of the query syntax. Special characters are: +, -, &&, ||, !, (, ), ", ~, *, ?, and :. Using JSON with solr_query requires additional syntax for literal characters that are Lucene special characters.

Syntax for a simple search string:
Simple search string mytestuser1?
Solr query name:mytestuser1\?
CQL Solr query solr_query='{"q":"name:mytestuser1\\?"}'
Syntax for a complex search string:
Complex search string (1+1):2
Solr query e:\(1\+1\)\:2
CQL Solr query solr_query='{"q":"e:\\(1\\+1\\)\\:2"}'

Field, query, and range faceting with a JSON query

Specify the facet parameters inside a facet JSON object to perform field, query, and range faceting inside Solr queries. Distributed pivot faceting is supported. The query syntax is less verbose to specify facets by:
  • Specifying each facet parameter without the facet prefix that is required by HTTP APIs.
  • Expressing multiple facet fields and queries inside a JSON array.
Faceted search example
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"field":"type"}}';
Query facet example
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":"type:0"}}';
Multiple queries example
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":["type:0","type:1"]}}';
Distributed pivot faceting example
SELECT id FROM table WHERE solr_query='{"q":"id:*","facet":{"pivot":"type,value","limit":"-1"}}'
Range facet example
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"range":"type", "f.type.range.start":-10, "f.type.range.end":10, "range.gap":1}}}';
The returned result is formatted as a single row with each column corresponding to the output of a facet (either field, query, or range). The value is represented as a JSON blob because facet results can be complex and nested. For example:
facet_fields           | facet_queries
------------------------+-------------------------
 {"type":{"0":2,"1":1}} | {"type:0":2,"type:1":1}
Range by date facet example
SELECT * FROM solr WHERE solr_query='{"q":"business_date:*","facet":{"range":"business_date", "f.business_date.range.start":"2015-01-01T00:00:00Z", "f.business_date.range.end":"2015-08-01T00:00:00Z", "f.business_date.range.gap":"+1MONTH"}}';
Warning: Solr range facets before, after, and between might return incorrect and inconsistent results on multi-node clusters. See SOLR-6187 and SOLR-6375.
Interval facet example
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"interval":"id", "interval.set":"[*,500]"}';

Tracing distributed queries

During a distributed query, every node is responsible for a set of token ranges. A shard is the node/ranges combination. The shard token range is reported:
  • In the shards.info response for HTTP queries.
  • In the system_traces.events table for HTTP queries that provide cassandra.trace=true and CQL Solr queries that enable tracing at the driver level.

Solr single-pass CQL queries

Single-pass distributed queries are supported in CQL Solr queries.

There are a number of ways to use a single-pass distributed query instead of the standard two-pass query. You can:
  • Specify the distrib.singlePass Boolean parameter in the query expression. Example:
    SELECT * FROM keyspace.table WHERE solr_query = '{"q" : "*:*", "distrib.singlePass" : true}' 
  • Use a token() or partition key restriction in the WHERE clause. Example:
    SELECT * FROM keyspace.table WHERE token(id) >= 3074457345618258604 AND solr_query = '{"q" : "*:*"}' 
  • Execute a COUNT query. Example:
    SELECT count(*) FROM keyspace.table WHERE solr_query = '{"q" : "*:*"}' 
  • Specify the primary key elements in the SELECT clause. Example:
    CREATE TABLE cycling.cyclist_name ( 
       id UUID, 
       lastname text, 
       firstname text );
    PRIMARY KEY ((id), lastname);
    CREATE SEARCH INDEX ON cycling.cyclist_name ... ;
    dsetool create_core cycling.cyclist_name ...
    INSERT into cycling.cyclist_name ... ; 
    SELECT id, lastname FROM cycling.cyclist_name 
     WHERE solr_query='{"q": "*:*","fq":"(id:a6f94417-e27b-444f-9d0c-dccb588e421f)"}';
For more on filtering Solr queries using CQL, see Search index filter syntax
Using a single-pass distributed query has an operational cost that includes potentially more disk and network overhead. With single-pass queries, each node reads all rows that satisfy the query and returns them to the coordinator node. An advanced feature, a single-pass distributed query saves one network round trip transfer during the retrieval of queried rows. A regular distributed query performs two network round trips:
  1. The first one to retrieve IDs from DSE Search that satisfy the query.
  2. The second one to retrieve only the rows that satisfy the query from the database, based on IDs from the first step.
Single-pass distributed queries are most efficient when most of the documents found are returned in the search results. Single-pass distributed queries are inefficient when most of the documents found are not returned to the coordinator node.

For example, a distributed query that only fans out to a single node from the coordinator node will likely be most efficient as a single-pass query.

Important: With single-pass queries, there is a limitation that returns only the table columns that are defined in the Solr schema.xml in the query results. This limitation also applies to map entries that do not conform to the dynamic field mapping. In the schema.xml file, the property stored=true must be included for the field to be retrievable by queries.
Consider the cycling.cyclist_name table example above. If the column firstname was not indexed, but needed to be retrievable by queries, the entry in the schema.xml for the firstname field could be:
<field indexed="false" multiValued="false"
  docValues="false" name="firstname" stored="true" type="StrField"/>
If the Solr core was created without a customized schema.xml file and generateResources=true was included, the default schema.xml file includes all fields that exist in the table. For more information, see dsetool create_core.

JSON query name option

Using the following syntax to name your queries to support metrics and monitoring for performance objects. Naming queries can be useful for tagging and JMX operations, for example.
SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"query.name":"Asian subjects", "q":"ethnicity:Asia*"}' LIMIT 50;

JSON query commit option

If you are executing custom queries after bulk document loading, and the auto soft commit is disabled or the configured value is extremely infrequent, and you want the latest data to be visible to your query, use the JSON query commit option to ensure that all pending updates are soft-committed before the query runs. By default, the commit option is set to false.

For example:
SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Asia*", "commit":true}' LIMIT 50;
Warning: Do not use the JSON commit option for live operations against a production cluster. DataStax recommends using the JSON commit option only when you would otherwise be forced to issue a commit though the Solr HTTP interface. The commit option is not a replacement for the normal auto soft commit process.

Queries to dynamically enable paging

To dynamically enable pagination when cql_solr_query_paging: off in dse.yaml, use the "paging":"driver" parameter:
select id from wiki.solr where solr_query='{"q":"*", "sort":"id asc", "paging":"driver"}';

Limit queries by time

DSE Search supports limiting queries by time by using the Solr timeAllowed parameter. DSE Search differs from native Solr:
  • If the timeAllowed is exceeded, an exception is thrown.
  • If the timeAllowed is exceeded, and the additional shards.tolerant parameter is set to true, the application returns the partial results collected so far.

    When partial results are returned, the CQL custom payload contains the DSESearch.isPartialResults key.

Example with a 30 second timeout:
http://localhost:8983/solr/demo.solr/select?q=ipod&timeAllowed=30000
Example with a 30 second timeout:
http://localhost:8983/solr/demo.solr/select?q=ipod&timeAllowed=30000
Example with a 30 second timeout:
SELECT * FROM users where solr_query = '{ "q": "*:*", "timeAllowed":30000}';

Escaping characters in a solr_query

How to escape special characters in a basic or advanced solr_query expression.

Solr queries require escaping special characters that are part of the query syntax. Special characters are: +, -, &&, ||, !, (, ), ", ~, *, ?, and :. To escape these characters, use a slash (\) before the character to escape. For example, to search for a literal double quotation mark (") character, escape the " for Solr with \".

When using solr_query you can escape special characters in two forms:
CQL Solr
...WHERE solr_query='field:value'
JSON
WHERE solr_query='{ "q": "field:value"}'

JSON-encoded queries require that values must also be JSON-escaped for special characters.

For queries that contain double quotation marks, use triple slashes \\\:
  • For query syntax: One slash \ to escape the "
  • For the JSON string syntax: Two slashes \\ to escape the \

    Triple slashes \\\ escape both characters in \" to produce \\ (an escaped escape) and \" (an escaped double quote).

Escaping single quotation marks

  • Double the single quotation mark (')
    CQL
    ...WHERE solr_query='name:Walter''s'
    JSON
    ...WHERE solr_query='{ "q": "Walter''s"}'
  • Use dollar-quotes for the string constant
    CQL
    ...WHERE solr_query=$$name:Walter's$$
    JSON
    ...WHERE solr_query=$${ "q": "Walter's"}$$

Query examples for escaping double quotation marks

CQL
Double the single quotation mark (') and add the backslash (\) for Solr escaping
...WHERE solr_query='name:Walter\''s'
JSON
Escape \" to \\\" to escape both special characters for JSON
...WHERE solr_query='{ "q": "Walter\\\"s"}'

Exact and fuzzy query examples

Exact phrase query
For a row that looks like this, with an email address that includes a double quotation mark greenr"q@example.com:
INSERT INTO users(id, email) VALUES(1, 'greenr"q@example.com')"
Perform a phrase query to search for the email address that is enclosed in double quotation marks:
SELECT * FROM users where solr_query = '
{ "q": "*:*", "fq": "email:\"greenr\\\"q@example.com\""}
';
Fuzzy query
For a row that looks like this, with the same email address that includes a double quotation mark greenr"q@example.com:
select * from test.users where solr_query='{"q":"email:r\\\"q@example"}' ;
 id    | email                 | solr_query
------+-------+------------------------------
 1     | greenr"q@example.com  | null
(1 rows)
For a term query (fuzzy search) for all email addresses that include r"q@example, remove the double quotation marks but retain triple quotation marks for the escaped double quotation character that is part of the email address:
SELECT * FROM users where solr_query = '
{ "q": "*:*", "fq": "email:r\\\"q@example"}
';

Using JSON with solr_query requires additional syntax for literal characters that are Lucene special characters. See JSON queries with literal characters that are Solr special characters.