CQL queries

CQL query syntax and examples for DSE Search.

DataStax Enterprise supports production-grade implementation of CQL queries in DSE Search. You can develop CQL-centric applications supporting full-text search without having to work with Solr-specific APIs. Only full text search queries are supported. Using CQL, DSE Search supports partial document updates that enable you to modify existing information while maintaining a lower transaction cost. Before using CQL Solr queries in DSE Search, configure solrconfig.xml to handle CQL queries.

This page covers:

DSE Search queries also support JSON-based query expressions.

Required configuration for CQL Solr queries in DSE Search

Using CQL solr_query syntax is supported only on nodes where search is enabled.

When you automatically generate resources, the solrconfig.xml file already contains the request handler for running CQL Solr queries in DSE Search. If you do not automatically generate resources and want to run CQL Solr queries using custom resources, the CqlSearchHandler handler is automatically injected:

<requestHandler class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler" name="solr_query" />

CQL Solr query limitations and considerations

CQL Solr queries are defaulted to an equivalent LIMIT 10.
Pagination is off by default. In dse.yaml, the cql_solr_query_paging option specifies when to use pagination (also called cursors).
Solr restrictions apply to pagination.
Queries with smaller result sets will see increased performance with paging off.

Note: Limitations and known Apache Solr issues apply to DSE Search queries. For example, incorrect SORT results for tokenized text fields.

CQL Solr query syntax

You can run CQL Solr queries using the SELECT statement that includes the search expression.

Synopsis

SELECT select expression
 FROM table
 [WHERE solr_query = 'search expression'] [LIMIT n]

There are two types of search expressions:

Search queries with CQL
Search queries with JSON

Remember on CQL that to use a single quotation mark itself in a string literal, you must escape it using a single quotation mark (so you'll need to double the single quotation marks). Dollar-quoted strings with double dollar signs ($$) support complex quoting. See CQL escaping characters.

CQL queries

The Solr query expression uses the syntax supported by the Solr q parameter. For example:

SELECT * FROM keyspace.table WHERE solr_query='name: cat name: dog -name:fish'

When you name specific columns, DSE Search retrieves only the specified columns and returns the columns as part of the resulting rows. DSE Search supports projections (SELECT a, b, c...) only, not functions, for the select expression. The following example retrieves only the name column:

SELECT name FROM keyspace.table WHERE solr_query='name:cat name:dog -name:fish'

Use the LIMIT clause to specify how many rows to return. The following example retrieves only 1 row:

SELECT * FROM keyspace.table WHERE solr_query='name:cat name:dog -name:fish' LIMIT 1

Use the count() function in CQL Solr queries to return the number of rows that satisfy the Solr query:

SELECT count(*) FROM table WHERE solr_query = '...';

Using count() in combination with LIMIT or facets results in an error.

All response queries of the drivers have a custom payload where the total number of documents found is returned. This number is keyed as DSESearch.numFound.

Column aliases are not supported in solr_query queries.

SearchAnalytics schema considerations

All of the fields that are queried on DSE SearchAnalytics clusters must be defined in the DSE Search configuration file (solrconfig.xml). Fields that are not defined in the search index schema columns are excluded from the results returned from Spark queries.

Queries for tuples and UDTs

DSE Search supports indexing and querying of advanced data types, including tuples and user-defined types (UDT).

The tuple data type holds fixed-length sets of typed positional fields. Use a tuple as an alternative to a user-defined type.
A user-defined type (UDT) facilitates handling multiple fields of related information in a table. UDTs are a specialization of tuples. All examples and documentation references to tuples apply to both tuples and UDTs.
Applications that require multiple tables can be simplified to use fewer tables by using a user-defined type to represent the related fields of information instead of storing the information in a separate table.

Configuration and schema requirements apply. See UDT query examples.

Using CQL partition key restrictions with Solr queries

CQL queries support restricting the query to a single partition key. Partition key restrictions work only when _partitionKey is explicitly indexed or the schema explicitly includes all of the components of the Cassandra partition key. In your schema, you can override _partitionKey when not using joins.

Example:

SELECT id, date, value FROM keyspace.table WHERE id = 'series1' AND solr_query='value:bar*'"

CQL partition key restrictions work only with fully specified partition keys. For example, with this table:

CREATE TABLE vtbl (k1 text, k2 text, valuetext, PRIMARYKEY ((k1, k2)))

Avoid using a query like this:

SELECT * FROM vtbl WHERE k1 = '50' AND solr_query='value:*'

Use a filter query against the partially specified composite partition key:

SELECT * FROM valuetable WHERE solr_query='{"q":"value:*", "fq":"k1:50"}'

Using the Solr token function

Solr CQL queries support limited use of the CQL token function. The token function enables targeted search that restricts the nodes queried to reduce latency.

Note: Using the Solr token function is for advanced users only and is supported only in specific use cases.

Example:

SELECT id, value FROM keyspace.table WHERE token(id) >= -3074457345618258601 AND token(id) <= 3074457345618258603 AND solr_query='id:*'

Example with an open range:

SELECT id, value FROM keyspace.table WHERE token(id) >= 3074457345618258604 AND solr_query='id:*'

Constraints apply to using the token function with Solr CQL queries:

token() cannot be used with route.range or route.partition
Wrapping token() ranges are not supported
A specified token() range must be owned by a single node; ranges cannot span multiple nodes
Because DSE uses the Solr single-pass queries, only the fields that are declared in the Solr schema are returned in the query results. If you have columns that do not need to be indexed, but still need to be returned by using a token-restricted query, you can declare the columns as stored non-indexed fields in your schema.xml file.

Secondary indexes (2i) in queries

Use solr_query to query secondary indexes (2i) that are created by Solr. Use:


SELECT * from users WHERE solr_query = '{"q":"irc:jdoe"}';

The secondary indexes created by Solr cannot be used as a Cassandra 2i index in a cqlsh query. For example, this syntax fails:

SELECT * FROM users WHERE irc =
                    'jdoe';

JSON query expressions

JSON-based query expressions and examples for DSE Search.

DSE Search queries support JSON-based query expressions.

This page covers:

JSON query syntax

The JSON query expression syntax is a JSON string. The JSON-based query expression supports local parameters in addition to the following parameters:

{
  "q": query_expression (string), 
  "fq": filter_query_expression(s) (string_or_array_of_strings),  
  "facet": facet_query_expression (object) 
  "sort": sort_expression (string), 
  "start": start_index(number),
  "tz": zoneID),   //  Any valid zone ID in java TimeZone class                            
  "paging": "driver" (string),
  "distrib.singlePass": true|false (boolean),
  "shards.failover": true|false (boolean),    // Default: true
  "shards.tolerant": true|false (boolean),    // Default: false
  "commit": true|false (boolean),
  "route.partition": partition_routing_expression (array_of_strings), 
  "route.range": range_routing_expression (array_of_strings), 
  "query.name": query_name (string),
}

For example:

SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"q":"ethnicity:Asian"}';

SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Mexi*", "sort":"id asc"}' LIMIT 3;

SELECT * FROM mykeyspace.mysolr WHERE solr_query='{"q" : "{!edismax}quotes:yearning or kills"}';

Note: To use Solr Extended DisMax Query Parser (eDisMax) with solr_query, you must include defaultSearchField in your schema.

Making distributed queries tolerant of shard failures

Since distributed queries contact many shards, making queries more tolerant of shard failures ensures more successful completions. Use shards.failover and shards.tolerant parameters to define query failover and tolerance of shard failures during JSON queries. Valid configurations:

Failover true, tolerant false (default)

"shards.failover": true,
"shards.tolerant": false,

Enables query failover and disables fault tolerance. Attempt to retry the failed shard requests when errors indicate that there is a reasonable chance of recovery. If any of the nodes (shards) that we scatter fail before the query is complete, retry the shard query against a replica.

Failover false, tolerant true

"shards.failover": false,
"shards.tolerant": true,

Disable query failover. Enable fault tolerance. Make the query succeed, even if the query only partially succeeded, and did not succeed for all nodes.

Failover false, tolerant false

"shards.failover": false,
"shards.tolerant": false,

Disable query failover. Disable fault tolerance.

Failover and tolerance of partial results cannot coexist in the same query. Queries support enabling tolerance for only one parameter.

Other fault tolerance configuration options include: netty_client_request_timeout in dse.yaml and read_request_timeout_in_ms in cassandra.yaml.

Literal characters that are Lucene Solr special characters

Lucene supports escaping special characters that are part of the query syntax. Special characters are: +, -, &&, ||, !, (, ), ", ~, *, ?,and :. Using JSON with solr_query requires additional syntax for literal characters that are Lucene special characters.

Syntax for a simple search string

Simple search string	`mytestuser1?`
Solr query	`name:mytestuser1\?`
CQL Solr query	`solr_query='{"q":"name:mytestuser1\\?"}'`

Syntax for a complex search string

Complex search string	`(1+1):2`
Solr query	`e:$1\+1$\:2`
CQL Solr query	`solr_query='{"q":"e:\$1\\+1\$\\:2"}'`

Overriding the default TimeZone (UTC) in search queries

Specify the TZ parameter to overwrite the default TimeZone (UTC) that is used for adding and rounding in date math. The local rules for the specified time zone, including the start and end of daylight saving time (DST) if any, determine when each arbitrary day starts. The time zone rules impact the rounding and adding of DAYs, but also cascades to rounding of HOUR, MIN, MONTH, and YEAR. For example, specifying a different time zone changes the result:

Date math	Result
`2016-03-10T12:34:56Z/YEAR`	Default TZ `2016-01-01T00:00:00Z`
	TZ=America/Los_Angeles `2016-01-01T08:00:00Z`
`2016-03-10T08:00:00Z+1DAY`	Default TZ `2016-03-11T08:00:00Z`
	TZ=America/Los_Angeles `2016-03-11T07:00:00Z`

The value of the TZ parameter can be any zone ID that is supported by the java TimeZone class.

Field, query, and range faceting with a JSON query

Specify the facet parameters inside a facet JSON object to perform field, query, and range faceting inside Solr queries. Distributed pivot faceting is supported. The query syntax is less verbose to specify facets by:

Specifying each facet parameter without the facet prefix that is required by HTTP APIs.
Expressing multiple facet fields and queries inside a JSON array.

Faceted search example

SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"field":"type"}}';

Query facet example

SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":"type:0"}}';

Multiple queries example

SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":["type:0","type:1"]}}';

Distributed pivot faceting example

SELECT id FROM table WHERE solr_query='{"q":"id:*","facet":{"pivot":"type,value","limit":"-1"}}'

Range facet example

SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"range":"type", "f.type.range.start":-10, "f.type.range.end":10, "range.gap":1}}}';

The returned result is formatted as a single row with each column corresponding to the output of a facet (either field, query, or range). The value is represented as a JSON blob because facet results can be complex and nested. For example:

facet_fields           | facet_queries
------------------------+-------------------------
 {"type":{"0":2,"1":1}} | {"type:0":2,"type:1":1}

Range by date facet example

SELECT * FROM solr WHERE solr_query='{"q":"business_date:*","facet":{"range":"business_date", "f.business_date.range.start":"2015-01-01T00:00:00Z", "f.business_date.range.end":"2015-08-01T00:00:00Z", "f.business_date.range.gap":"+1MONTH"}}';

Warning: Solr range facets before, after, and between might return incorrect and inconsistent results on multi-node clusters. See SOLR-6187 and SOLR-6375.

Interval facet example

SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"interval":"id", "interval.set":"[*,500]"}';

Tracing distributed queries

During a distributed query, every node is responsible for a set of Cassandra token ranges. A shard is the node/ranges combination. The shard token range is reported:

In the shards.info response for HTTP queries.
In the system_traces.events table for HTTP queries that provide cassandra.trace=true and CQL Solr queries that enable tracing at the driver level.

JSON single-pass distributed query

Single-pass distributed queries are supported in CQL queries.

To use a single pass distributed query instead of the standard two-pass query, specify the distrib.singlePass Boolean parameter in the JSON query expression:

SELECT * FROM ks.cf WHERE solr_query = '{"q" : "*:*", "distrib.singlePass" : true}'

Using a single-pass distributed query has an operational cost that includes potentially more disk and network overhead. With single-pass queries, each node reads all rows that satisfy the query and returns them to the coordinator node. An advanced feature, a single-pass distributed query saves one network round trip transfer during the retrieval of queried rows. A regular distributed query performs two network round trips, the first one to retrieve IDs from Solr that satisfy the query and another trip to retrieve only the rows that satisfy the query from Cassandra, based on IDs from the first step. Single-pass distributed queries are most efficient when most of the documents found are returned in the search results, and they are not efficient when most of the documents found will not be returned to the coordinator node.

For example, a distributed query that only fans out to a single node from the coordinator node will likely be most efficient as a single-pass query.

Single pass distributed queries for CQL are supported when the additional distrib.singlePass boolean parameter is included in the JSON query.

With single-pass queries, there is a limitation that only document fields that are defined in the Solr schema are returned as query results. This limitation also applies to map entries that do not conform to the dynamic field mapping.

JSON query naming

Use the following syntax to name your queries to support metrics and monitoring for performance objects. Naming queries can be useful for tagging and JMX operations. For example:

SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"query.name":"Asian subjects", "q":"ethnicity:Asia*"}' LIMIT 50;

JSON query commit option

If you are executing custom queries after bulk document loading, and the normal auto soft commit is disabled or extremely infrequent, and you want the latest data to be visible to your query, use the JSON query commit option to ensure that all pending updates are soft-committed before the query runs. By default, the commit option is set to false.

For example:

SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Asia*", "commit":true}' LIMIT 50;

Warning: Do not use the JSON commit option for live operations against a production cluster. DataStax recommends using the JSON commit option only when you would otherwise be forced to issue a commit though the Solr HTTP interface. The commit option is not a replacement for the normal auto soft commit process.

Queries for tuples and UDTs

DSE Search supports indexing and querying of advanced data types, including tuples and user-defined types (UDT).

The tuple data type holds fixed-length sets of typed positional fields. Use a tuple as an alternative to a user-defined type.
A user-defined type (UDT) facilitates handling multiple fields of related information in a table. UDTs are a specialization of tuples. All examples and documentation references to tuples apply to both tuples and UDTs.
Applications that require multiple tables can be simplified to use fewer tables by using a user-defined type to represent the related fields of information instead of storing the information in a separate table.

Configuration and schema requirements apply. See UDT query examples.

Queries to dynamically enable paging

To dynamically enable pagination when cql_solr_query_paging: off in dse.yaml, use the "paging":"driver" parameter:

cqlsh> select id from wiki.solr where solr_query='{"q":"*", "sort":"id asc", "paging":"driver"}';