CQL queries

CQL query syntax and examples for DSE Search.

DataStax Enterprise supports production-grade implementation of CQL queries in DSE Search. You can develop CQL-centric applications supporting full-text search without having to work with Solr-specific APIs. Only full text search queries are supported. Using CQL, DSE Search supports partial document updates that enable you to modify existing information while maintaining a lower transaction cost. Before using CQL Solr queries in DSE Search, configure solrconfig.xml to handle CQL queries.

Required configuration for CQL Solr queries in DSE Search 

Using CQL solr_query syntax is supported only on nodes where search is enabled.

When you automatically generate resources, the solrconfig.xml file already contains the request handler for running CQL Solr queries in DSE Search. If you do not automatically generate resources and want to run CQL Solr queries using custom resources, the CqlSearchHandler handler is automatically injected:
<requestHandler class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler" name="solr_query" />

CQL Solr query limitations and considerations 

  • CQL Solr queries are defaulted to an equivalent LIMIT 10.
  • Pagination is off by default. In dse.yaml, the cql_solr_query_paging option specifies when to use pagination (also called cursors).
  • Solr restrictions apply to pagination.
  • Queries with smaller result sets will see increased performance with paging off.
Note: Limitations and known Apache Solr issues apply to DSE Search queries. For example, incorrect SORT results for tokenized text fields.

CQL Solr query syntax 

You can run CQL Solr queries using the SELECT statement that includes the search expression.

Synopsis 

SELECT select expression
 FROM table
 [WHERE solr_query = 'search expression'] [LIMIT n]
There are two types of search expressions: Remember on CQL that to use a single quotation mark itself in a string literal, you must escape it using a single quotation mark (so you'll need to double the single quotation marks). Dollar-quoted strings with double dollar signs ($$) support complex quoting. See CQL escaping characters.

CQL queries 

The Solr query expression uses the syntax supported by the Solr q parameter. For example:
SELECT * FROM keyspace.table WHERE solr_query='name: cat name: dog -name:fish'
When you name specific columns, DSE Search retrieves only the specified columns and returns the columns as part of the resulting rows. DSE Search supports projections (SELECT a, b, c...) only, not functions, for the select expression. The following example retrieves only the name column:
SELECT name FROM keyspace.table WHERE solr_query='name:cat name:dog -name:fish'
Use the LIMIT clause to specify how many rows to return. The following example retrieves only 1 row:
SELECT * FROM keyspace.table WHERE solr_query='name:cat name:dog -name:fish' LIMIT 1
Use the count() function in CQL Solr queries to return the number of rows that satisfy the Solr query:
SELECT count(*) FROM table WHERE solr_query = '...';

Using count() in combination with LIMIT or facets results in an error.

All response queries of the drivers have a custom payload where the total number of documents found is returned. This number is keyed as DSESearch.numFound.

Column aliases are not supported in solr_query queries.

SearchAnalytics schema considerations 

All of the fields that are queried on DSE SearchAnalytics clusters must be defined in the DSE Search configuration file (solrconfig.xml). Fields that are not defined in the search index schema columns are excluded from the results returned from Spark queries.

Queries for tuples and UDTs 

DSE Search supports indexing and querying of advanced data types, including tuples and user-defined types (UDT).
  • The tuple data type holds fixed-length sets of typed positional fields. Use a tuple as an alternative to a user-defined type.
  • A user-defined type (UDT) facilitates handling multiple fields of related information in a table. UDTs are a specialization of tuples. All examples and documentation references to tuples apply to both tuples and UDTs.

    Applications that require multiple tables can be simplified to use fewer tables by using a user-defined type to represent the related fields of information instead of storing the information in a separate table.

Configuration and schema requirements apply. See UDT query examples.

Using CQL partition key restrictions with Solr queries 

CQL queries support restricting the query to a single partition key. Partition key restrictions work only when _partitionKey is explicitly indexed or the schema explicitly includes all of the components of the Cassandra partition key. In your schema, you can override _partitionKey when not using joins.

Example:
SELECT id, date, value FROM keyspace.table WHERE id = 'series1' AND solr_query='value:bar*'" 
CQL partition key restrictions work only with fully specified partition keys. For example, with this table:
CREATE TABLE vtbl (k1 text, k2 text, valuetext, PRIMARYKEY ((k1, k2)))
Avoid using a query like this:
SELECT * FROM vtbl WHERE k1 = '50' AND solr_query='value:*'
Use a filter query against the partially specified composite partition key:
SELECT * FROM valuetable WHERE solr_query='{"q":"value:*", "fq":"k1:50"}'

Using the Solr token function 

Solr CQL queries support limited use of the CQL token function. The token function enables targeted search that restricts the nodes queried to reduce latency.
Note: Using the Solr token function is for advanced users only and is supported only in specific use cases.
Example:
SELECT id, value FROM keyspace.table WHERE token(id) >= -3074457345618258601 AND token(id) <= 3074457345618258603 AND solr_query='id:*'
Example with an open range:
SELECT id, value FROM keyspace.table WHERE token(id) >= 3074457345618258604 AND solr_query='id:*'
Constraints apply to using the token function with Solr CQL queries:
  • token() cannot be used with route.range or route.partition
  • Wrapping token() ranges are not supported
  • A specified token() range must be owned by a single node; ranges cannot span multiple nodes
  • Because DSE uses the Solr single-pass queries, only the fields that are declared in the Solr schema are returned in the query results. If you have columns that do not need to be indexed, but still need to be returned by using a token-restricted query, you can declare the columns as stored non-indexed fields in your schema.xml file.

Secondary indexes (2i) in queries 

Use solr_query to query secondary indexes (2i) that are created by Solr. Use:

SELECT * from users WHERE solr_query = '{"q":"irc:jdoe"}';
The secondary indexes created by Solr cannot be used as a Cassandra 2i index in a cqlsh query. For example, this syntax fails: SELECT * FROM users WHERE irc = 'jdoe';.

JSON query expressions

JSON-based query expressions and examples for DSE Search.

DSE Search queries support JSON-based query expressions.

This page covers:

JSON query syntax 

The JSON query expression syntax is a JSON string. The JSON-based query expression supports local parameters in addition to the following parameters:
{
  "q": query_expression (string), 
  "fq": filter_query_expression(s) (string_or_array_of_strings),  
  "facet": facet_query_expression (object) 
  "sort": sort_expression (string), 
  "start": start_index(number),
  "tz": zoneID),   //  Any valid zone ID in java TimeZone class                            
  "paging": "driver" (string),
  "distrib.singlePass": true|false (boolean),
  "shards.failover": true|false (boolean),    // Default: true
  "shards.tolerant": true|false (boolean),    // Default: false
  "commit": true|false (boolean),
  "route.partition": partition_routing_expression (array_of_strings), 
  "route.range": range_routing_expression (array_of_strings), 
  "query.name": query_name (string),
}
For example:
SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"q":"ethnicity:Asian"}';
SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Mexi*", "sort":"id asc"}' LIMIT 3;
SELECT * FROM mykeyspace.mysolr WHERE solr_query='{"q" : "{!edismax}quotes:yearning or kills"}';
Note: To use Solr Extended DisMax Query Parser (eDisMax) with solr_query, you must include defaultSearchField in your schema.

Making distributed queries tolerant of shard failures 

Since distributed queries contact many shards, making queries more tolerant of shard failures ensures more successful completions. Use shards.failover and shards.tolerant parameters to define query failover and tolerance of shard failures during JSON queries. Valid configurations:
Failover true, tolerant false (default)
"shards.failover": true,
"shards.tolerant": false,
Enables query failover and disables fault tolerance. Attempt to retry the failed shard requests when errors indicate that there is a reasonable chance of recovery. If any of the nodes (shards) that we scatter fail before the query is complete, retry the shard query against a replica.
Failover false, tolerant true
"shards.failover": false,
"shards.tolerant": true,
Disable query failover. Enable fault tolerance. Make the query succeed, even if the query only partially succeeded, and did not succeed for all nodes.
Failover false, tolerant false
"shards.failover": false,
"shards.tolerant": false,
Disable query failover. Disable fault tolerance.
Failover and tolerance of partial results cannot coexist in the same query. Queries support enabling tolerance for only one parameter.

Other fault tolerance configuration options include: netty_client_request_timeout in dse.yaml and read_request_timeout_in_ms in cassandra.yaml.

Literal characters that are Lucene Solr special characters 

Lucene supports escaping special characters that are part of the query syntax. Special characters are: +, -, &&, ||, !, (, ), ", ~, *, ?, and :. Using JSON with solr_query requires additional syntax for literal characters that are Lucene special characters.

Syntax for a simple search string

Simple search string mytestuser1?
Solr query name:mytestuser1\?
CQL Solr query solr_query='{"q":"name:mytestuser1\\?"}'

Syntax for a complex search string

Complex search string (1+1):2
Solr query e:\(1\+1\)\:2
CQL Solr query solr_query='{"q":"e:\\(1\\+1\\)\\:2"}'

Overriding the default TimeZone (UTC) in search queries 

Specify the TZ parameter to overwrite the default TimeZone (UTC) that is used for adding and rounding in date math. The local rules for the specified time zone, including the start and end of daylight saving time (DST) if any, determine when each arbitrary day starts. The time zone rules impact the rounding and adding of DAYs, but also cascades to rounding of HOUR, MIN, MONTH, and YEAR. For example, specifying a different time zone changes the result:
Date math Result
2016-03-10T12:34:56Z/YEAR Default TZ

2016-01-01T00:00:00Z

  TZ=America/Los_Angeles

2016-01-01T08:00:00Z

2016-03-10T08:00:00Z+1DAY Default TZ

2016-03-11T08:00:00Z

  TZ=America/Los_Angeles

2016-03-11T07:00:00Z

The value of the TZ parameter can be any zone ID that is supported by the java TimeZone class.

Field, query, and range faceting with a JSON query  

Specify the facet parameters inside a facet JSON object to perform field, query, and range faceting inside Solr queries. Distributed pivot faceting is supported. The query syntax is less verbose to specify facets by:
  • Specifying each facet parameter without the facet prefix that is required by HTTP APIs.
  • Expressing multiple facet fields and queries inside a JSON array.
Faceted search example 
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"field":"type"}}';
Query facet example 
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":"type:0"}}';
Multiple queries example 
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":["type:0","type:1"]}}';
Distributed pivot faceting example 
SELECT id FROM table WHERE solr_query='{"q":"id:*","facet":{"pivot":"type,value","limit":"-1"}}'
Range facet example 
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"range":"type", "f.type.range.start":-10, "f.type.range.end":10, "range.gap":1}}}';
The returned result is formatted as a single row with each column corresponding to the output of a facet (either field, query, or range). The value is represented as a JSON blob because facet results can be complex and nested. For example:
facet_fields           | facet_queries
------------------------+-------------------------
 {"type":{"0":2,"1":1}} | {"type:0":2,"type:1":1}
Range by date facet example 
SELECT * FROM solr WHERE solr_query='{"q":"business_date:*","facet":{"range":"business_date", "f.business_date.range.start":"2015-01-01T00:00:00Z", "f.business_date.range.end":"2015-08-01T00:00:00Z", "f.business_date.range.gap":"+1MONTH"}}';
Warning: Solr range facets before, after, and between might return incorrect and inconsistent results on multi-node clusters. See SOLR-6187 and SOLR-6375.
Interval facet example 
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"interval":"id", "interval.set":"[*,500]"}';

Tracing distributed queries 

During a distributed query, every node is responsible for a set of Cassandra token ranges. A shard is the node/ranges combination. The shard token range is reported:
  • In the shards.info response for HTTP queries.
  • In the system_traces.events table for HTTP queries that provide cassandra.trace=true and CQL Solr queries that enable tracing at the driver level.

JSON single-pass distributed query 

Single-pass distributed queries are supported in CQL queries.

To use a single pass distributed query instead of the standard two-pass query, specify the distrib.singlePass Boolean parameter in the JSON query expression:
SELECT * FROM ks.cf WHERE solr_query = '{"q" : "*:*", "distrib.singlePass" : true}' 
Using a single-pass distributed query has an operational cost that includes potentially more disk and network overhead. With single-pass queries, each node reads all rows that satisfy the query and returns them to the coordinator node. An advanced feature, a single-pass distributed query saves one network round trip transfer during the retrieval of queried rows. A regular distributed query performs two network round trips, the first one to retrieve IDs from Solr that satisfy the query and another trip to retrieve only the rows that satisfy the query from Cassandra, based on IDs from the first step. Single-pass distributed queries are most efficient when most of the documents found are returned in the search results, and they are not efficient when most of the documents found will not be returned to the coordinator node.

For example, a distributed query that only fans out to a single node from the coordinator node will likely be most efficient as a single-pass query.

Single pass distributed queries for CQL are supported when the additional distrib.singlePass boolean parameter is included in the JSON query.

With single-pass queries, there is a limitation that only document fields that are defined in the Solr schema are returned as query results. This limitation also applies to map entries that do not conform to the dynamic field mapping.

JSON query naming 

Use the following syntax to name your queries to support metrics and monitoring for performance objects. Naming queries can be useful for tagging and JMX operations. For example:
SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"query.name":"Asian subjects", "q":"ethnicity:Asia*"}' LIMIT 50;

JSON query commit option 

If you are executing custom queries after bulk document loading, and the normal auto soft commit is disabled or extremely infrequent, and you want the latest data to be visible to your query, use the JSON query commit option to ensure that all pending updates are soft-committed before the query runs. By default, the commit option is set to false.

For example:
SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Asia*", "commit":true}' LIMIT 50;
Warning: Do not use the JSON commit option for live operations against a production cluster. DataStax recommends using the JSON commit option only when you would otherwise be forced to issue a commit though the Solr HTTP interface. The commit option is not a replacement for the normal auto soft commit process.

Queries for tuples and UDTs

DSE Search supports indexing and querying of advanced data types, including tuples and user-defined types (UDT).
  • The tuple data type holds fixed-length sets of typed positional fields. Use a tuple as an alternative to a user-defined type.
  • A user-defined type (UDT) facilitates handling multiple fields of related information in a table. UDTs are a specialization of tuples. All examples and documentation references to tuples apply to both tuples and UDTs.

    Applications that require multiple tables can be simplified to use fewer tables by using a user-defined type to represent the related fields of information instead of storing the information in a separate table.

Configuration and schema requirements apply. See UDT query examples.

Queries to dynamically enable paging 

To dynamically enable pagination when cql_solr_query_paging: off in dse.yaml, use the "paging":"driver" parameter:
cqlsh> select id from wiki.solr where solr_query='{"q":"*", "sort":"id asc", "paging":"driver"}';