CQL queries
CQL query syntax and examples for DSE Search.
DataStax Enterprise supports production-grade implementation of CQL queries in DSE Search. You can develop CQL-centric applications supporting full-text search without having to work with Solr-specific APIs. Only full text search queries are supported. Using CQL, DSE Search supports partial document updates that enable you to modify existing information while maintaining a lower transaction cost. Before using CQL Solr queries in DSE Search, configure solrconfig.xml to handle CQL queries.
- Required configuration for CQL Solr queries in DSE Search
- CQL Solr query limitations and considerations
- CQL Solr query syntax
- CQL queries
- SearchAnalytics schema considerations
- Queries for tuples and UDTs
- Using CQL partition key restrictions with Solr queries
- Using the Solr token function
- Secondary indexes (2i) in queries
Required configuration for CQL Solr queries in DSE Search
Using CQL solr_query syntax is supported only on nodes where search is enabled.
<requestHandler class="com.datastax.bdp.search.solr.handler.component.CqlSearchHandler" name="solr_query" />
CQL Solr query limitations and considerations
- CQL Solr queries are defaulted to an equivalent LIMIT 10.
- Pagination is off by default. In dse.yaml, the cql_solr_query_paging option specifies when to use pagination (also called cursors).
- Solr restrictions apply to pagination.
- Queries with smaller result sets will see increased performance with paging off.
CQL Solr query syntax
You can run CQL Solr queries using the SELECT statement that includes the search expression.
SELECT select expression
FROM table
[WHERE solr_query = 'search expression'] [LIMIT n]
- Search queries with CQL
- Search queries with JSON
CQL queries
The Solr query expression uses the syntax supported by the Solr q parameter. For example:SELECT * FROM keyspace.table WHERE solr_query='name: cat name: dog -name:fish'
When
you name specific columns, DSE Search retrieves only the specified columns and returns
the columns as part of the resulting rows. DSE Search supports projections (SELECT a, b,
c...) only, not functions, for the select expression. The following example retrieves
only the name
column:SELECT name FROM keyspace.table WHERE solr_query='name:cat name:dog -name:fish'
SELECT * FROM keyspace.table WHERE solr_query='name:cat name:dog -name:fish' LIMIT 1
SELECT count(*) FROM table WHERE solr_query = '...';
Using count() in combination with LIMIT or facets results in an error.
All response queries of the drivers have a custom payload where the total number of documents found is returned. This number is keyed as DSESearch.numFound.
Column aliases are not supported in solr_query queries.
SearchAnalytics schema considerations
All of the fields that are queried on DSE SearchAnalytics clusters must be defined in the DSE Search configuration file (solrconfig.xml). Fields that are not defined in the search index schema columns are excluded from the results returned from Spark queries.
Queries for tuples and UDTs
- The tuple data type holds fixed-length sets of typed positional fields. Use a tuple as an alternative to a user-defined type.
- A user-defined type (UDT) facilitates handling multiple fields of related information in a
table. UDTs are a specialization of tuples. All examples and documentation
references to tuples apply to both tuples and UDTs.
Applications that require multiple tables can be simplified to use fewer tables by using a user-defined type to represent the related fields of information instead of storing the information in a separate table.
Using CQL partition key restrictions with Solr queries
CQL queries support restricting the query to a single partition key. Partition key restrictions work only when _partitionKey is explicitly indexed or the schema explicitly includes all of the components of the Cassandra partition key. In your schema, you can override _partitionKey when not using joins.
SELECT id, date, value FROM keyspace.table WHERE id = 'series1' AND solr_query='value:bar*'"
CREATE TABLE vtbl (k1 text, k2 text, valuetext, PRIMARYKEY ((k1, k2)))
SELECT * FROM vtbl WHERE k1 = '50' AND solr_query='value:*'
SELECT * FROM valuetable WHERE solr_query='{"q":"value:*", "fq":"k1:50"}'
Using the Solr token function
SELECT id, value FROM keyspace.table WHERE token(id) >= -3074457345618258601 AND token(id) <= 3074457345618258603 AND solr_query='id:*'
SELECT id, value FROM keyspace.table WHERE token(id) >= 3074457345618258604 AND solr_query='id:*'
- token() cannot be used with route.range or route.partition
- Wrapping token() ranges are not supported
- A specified token() range must be owned by a single node; ranges cannot span multiple nodes
- Because DSE uses the Solr single-pass queries, only the fields that are declared in the Solr schema are returned in the query results. If you have columns that do not need to be indexed, but still need to be returned by using a token-restricted query, you can declare the columns as stored non-indexed fields in your schema.xml file.
Secondary indexes (2i) in queries
SELECT * from users WHERE solr_query = '{"q":"irc:jdoe"}';
The
secondary indexes created by Solr cannot be used as a Cassandra 2i index in a cqlsh
query. For example, this syntax fails: SELECT * FROM users WHERE irc =
'jdoe';
. JSON query expressions
JSON-based query expressions and examples for DSE Search.
This page covers:
JSON query syntax
The JSON query expression syntax is a JSON string. The JSON-based query expression supports local parameters in addition to the following parameters:{
"q": query_expression (string),
"fq": filter_query_expression(s) (string_or_array_of_strings),
"facet": facet_query_expression (object)
"sort": sort_expression (string),
"start": start_index(number),
"tz": zoneID), // Any valid zone ID in java TimeZone class
"paging": "driver" (string),
"distrib.singlePass": true|false (boolean),
"shards.failover": true|false (boolean), // Default: true
"shards.tolerant": true|false (boolean), // Default: false
"commit": true|false (boolean),
"route.partition": partition_routing_expression (array_of_strings),
"route.range": range_routing_expression (array_of_strings),
"query.name": query_name (string),
}
For
example:SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"q":"ethnicity:Asian"}';
SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Mexi*", "sort":"id asc"}' LIMIT 3;
SELECT * FROM mykeyspace.mysolr WHERE solr_query='{"q" : "{!edismax}quotes:yearning or kills"}';
Making distributed queries tolerant of shard failures
- Failover true, tolerant false (default)
Enables query failover and disables fault tolerance. Attempt to retry the failed shard requests when errors indicate that there is a reasonable chance of recovery. If any of the nodes (shards) that we scatter fail before the query is complete, retry the shard query against a replica."shards.failover": true, "shards.tolerant": false,
- Failover false, tolerant true
Disable query failover. Enable fault tolerance. Make the query succeed, even if the query only partially succeeded, and did not succeed for all nodes."shards.failover": false, "shards.tolerant": true,
- Failover false, tolerant false
Disable query failover. Disable fault tolerance."shards.failover": false, "shards.tolerant": false,
Other fault tolerance configuration options include: netty_client_request_timeout in dse.yaml and read_request_timeout_in_ms in cassandra.yaml.
Literal characters that are Lucene Solr special characters
Lucene supports escaping special characters that are part of
the query syntax. Special characters are: +, -, &&, ||, !, (, ),
", ~, *, ?,
and :
. Using
JSON with solr_query
requires additional syntax for literal
characters that are Lucene special characters.
Syntax for a simple search string
Simple search string | mytestuser1? |
Solr query | name:mytestuser1\? |
CQL Solr query | solr_query='{"q":"name:mytestuser1\\?"}' |
Syntax for a complex search string
Complex search string | (1+1):2 |
Solr query | e:\(1\+1\)\:2 |
CQL Solr query | solr_query='{"q":"e:\\(1\\+1\\)\\:2"}' |
Overriding the default TimeZone (UTC) in search queries
Date math | Result |
---|---|
2016-03-10T12:34:56Z/YEAR |
Default TZ
|
TZ=America/Los_Angeles
|
|
2016-03-10T08:00:00Z+1DAY |
Default TZ
|
TZ=America/Los_Angeles
|
Field, query, and range faceting with a JSON query
- Specifying each facet parameter without the facet prefix that is required by HTTP APIs.
- Expressing multiple facet fields and queries inside a JSON array.
- Faceted search example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"field":"type"}}';
- Query facet example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":"type:0"}}';
- Multiple queries example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":["type:0","type:1"]}}';
- Distributed pivot faceting example
-
SELECT id FROM table WHERE solr_query='{"q":"id:*","facet":{"pivot":"type,value","limit":"-1"}}'
- Range facet example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"range":"type", "f.type.range.start":-10, "f.type.range.end":10, "range.gap":1}}}';
The returned result is formatted as a single row with each column corresponding to the output of a facet (either field, query, or range). The value is represented as a JSON blob because facet results can be complex and nested. For example:facet_fields | facet_queries ------------------------+------------------------- {"type":{"0":2,"1":1}} | {"type:0":2,"type:1":1}
- Range by date facet example
-
SELECT * FROM solr WHERE solr_query='{"q":"business_date:*","facet":{"range":"business_date", "f.business_date.range.start":"2015-01-01T00:00:00Z", "f.business_date.range.end":"2015-08-01T00:00:00Z", "f.business_date.range.gap":"+1MONTH"}}';
Tracing distributed queries
- In the shards.info response for HTTP queries.
- In the system_traces.events table for HTTP queries that provide cassandra.trace=true and CQL Solr queries that enable tracing at the driver level.
JSON single-pass distributed query
Single-pass distributed queries are supported in CQL queries.
distrib.singlePass
Boolean parameter in the JSON
query
expression:SELECT * FROM ks.cf WHERE solr_query = '{"q" : "*:*", "distrib.singlePass" : true}'
Using
a single-pass distributed query has an operational cost that includes potentially
more disk and network overhead. With single-pass queries, each node reads all rows
that satisfy the query and returns them to the coordinator node. An advanced
feature, a single-pass distributed query saves one network round trip transfer
during the retrieval of queried rows. A regular distributed query performs two
network round trips, the first one to retrieve IDs from Solr that satisfy the query
and another trip to retrieve only the rows that satisfy the query from Cassandra,
based on IDs from the first step. Single-pass distributed queries are most efficient
when most of the documents found are returned in the search results, and they are
not efficient when most of the documents found will not be returned to the
coordinator node. For example, a distributed query that only fans out to a single node from the coordinator node will likely be most efficient as a single-pass query.
Single pass distributed queries for CQL are supported when the additional
distrib.singlePass
boolean parameter is included in the
JSON query.
With single-pass queries, there is a limitation that only document fields that are defined in the Solr schema are returned as query results. This limitation also applies to map entries that do not conform to the dynamic field mapping.
JSON query naming
SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"query.name":"Asian subjects", "q":"ethnicity:Asia*"}' LIMIT 50;
JSON query commit option
If you are executing custom queries after bulk document loading, and the normal auto soft commit is disabled or extremely infrequent, and you want the latest data to be visible to your query, use the JSON query commit option to ensure that all pending updates are soft-committed before the query runs. By default, the commit option is set to false.
SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Asia*", "commit":true}' LIMIT 50;
Queries for tuples and UDTs
- The tuple data type holds fixed-length sets of typed positional fields. Use a tuple as an alternative to a user-defined type.
- A user-defined type (UDT) facilitates handling multiple fields of related information in a
table. UDTs are a specialization of tuples. All examples and documentation
references to tuples apply to both tuples and UDTs.
Applications that require multiple tables can be simplified to use fewer tables by using a user-defined type to represent the related fields of information instead of storing the information in a separate table.
Queries to dynamically enable paging
"paging":"driver"
parameter:cqlsh> select id from wiki.solr where solr_query='{"q":"*", "sort":"id asc", "paging":"driver"}';