Search queries with JSON
DataStax Enterprise supports JSON-based query expressions queries.
JSON query syntax
The JSON query expression syntax is a JSON string. The JSON-based query expression supports local parameters in addition to the following parameters:{
"q": query_expression (string),
"fq": filter_query_expression(s) (string_or_array_of_strings),
"tz": zoneID, // Any valid zone ID in java TimeZone class
"facet": facet_query_expression (object)
"sort": sort_expression (string),
"start": start_index(number),
"paging": "driver" (string),
"distrib.singlePass": true|false (boolean),
"shards.failover": true|false (boolean), // Default: true
"shards.tolerant": true|false (boolean), // Default: false
"commit": true|false (boolean),
"route.partition": partition_routing_expression (array_of_strings),
"route.range": range_routing_expression (array_of_strings),
"query.name": query_name (string)
}
For
example:SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"q":"ethnicity:Asian"}';
SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Mexi*", "sort":"id asc"}' LIMIT 3;
SELECT * FROM mykeyspace.mysolr WHERE solr_query='{"q" : "{!edismax}quotes:yearning or kills"}';
Making distributed queries tolerant of shard failures
Valid configurations | Description |
---|---|
|
This default configuration enables query failover and disables fault tolerance. Attempt to retry the failed shard requests when errors indicate that there is a reasonable chance of recovery. If any of the nodes (shards) that we scatter to fail before the query is complete, retry the shard query against a replica. |
|
Disable query failover. Enable fault tolerance. Make the query succeed, even if the query only partially succeeded, and did not succeed for all nodes. |
|
Disable query failover. Disable fault tolerance. |
Other fault tolerance configuration options include: netty_client_request_timeout in dse.yaml and read_request_timeout_in_ms in cassandra.yaml.
JSON queries with literal characters that are Lucene/Solr special characters
Lucene supports escaping special characters that are part of the
query syntax. Special characters are: +, -, &&, ||, !, (, ), ", ~,
*, ?, and :
. Using JSON with solr_query requires
additional syntax for literal characters that are Lucene special characters.
Simple search string | mytestuser1? |
Solr query | name:mytestuser1\? |
CQL Solr Query | solr_query='{"q":"name:mytestuser1\\?"}' |
Complex search string | (1+1):2 |
Solr query | e:\(1\+1\)\:2 |
CQL Solr Query | solr_query='{"q":"e:\\(1\\+1\\)\\:2"}' |
Escape characters in queries
Solr queries require escaping special characters that are part of the
query syntax. To escape these characters, use a slash (\) before the character to
escape. For example, to search for a literal double quotation mark (") character,
escape the "
for Solr with \"
.
\\\
:- For query syntax: One slash
\
to escape the"
- For the JSON string syntax: Two slashes
\\
to escape the\
Triple slashes
\\\
escape both characters in\"
to produce\\
(an escaped escape) and\"
(an escaped double quote).
Query type | Example |
---|---|
Exact phrase query | For a row that looks like this, with an email address that
includes a double quotation mark
greenr"q@example.com :
Perform a phrase query to search for the email address that is enclosed in double quotation marks:
|
Fuzzy query | For a row that looks like this, with the same email address
that includes a double quotation mark
greenr"q@example.com : For
a term query (fuzzy search) for all email addresses that include
r"q@example , remove the double quotation
marks but retain triple quotation marks for the escaped double
quotation character that is part of the email
address:
|
Overriding the default TimeZone (UTC) in search queries
Date math | Result |
---|---|
2016-03-10T12:34:56Z/YEAR |
Default TZ
|
TZ=America/Los_Angeles
|
|
2016-03-10T08:00:00Z+1DAY |
Default TZ
|
TZ=America/Los_Angeles
|
Field, query, and range faceting with a JSON query
- Specifying each facet parameter without the facet prefix that is required by HTTP APIs.
- Expressing multiple facet fields and queries inside a JSON array.
- Faceted search example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"field":"type"}}';
- Query facet example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":"type:0"}}';
- Multiple queries example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":["type:0","type:1"]}}';
- Distributed pivot faceting example
-
SELECT id FROM table WHERE solr_query='{"q":"id:*","facet":{"pivot":"type,value","limit":"-1"}}'
- Range facet example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"range":"type", "f.type.range.start":-10, "f.type.range.end":10, "range.gap":1}}}';
The returned result is formatted as a single row with each column corresponding to the output of a facet (either field, query, or range). The value is represented as a JSON blob because facet results can be complex and nested. For example:facet_fields | facet_queries ------------------------+------------------------- {"type":{"0":2,"1":1}} | {"type:0":2,"type:1":1}
- Range by date facet example
-
SELECT * FROM solr WHERE solr_query='{"q":"business_date:*","facet":{"range":"business_date", "f.business_date.range.start":"2015-01-01T00:00:00Z", "f.business_date.range.end":"2015-08-01T00:00:00Z", "f.business_date.range.gap":"+1MONTH"}}';
JSON single-pass distributed query
Single-pass distributed queries are supported in CQL Solr queries.
SELECT * FROM ks.cf WHERE solr_query = '{"q" : "*:*", "distrib.singlePass" : true}'
Using
a single-pass distributed query has an operational cost that includes potentially
more disk and network overhead. With single-pass queries, each node reads all rows
that satisfy the query and returns them to the coordinator node. An advanced
feature, a single-pass distributed query saves one network round trip transfer
during the retrieval of queried rows. A regular distributed query performs two
network round trips, the first one to retrieve IDs from Solr that satisfy the query
and another trip to retrieve only the rows that satisfy the query from Cassandra,
based on IDs from the first step. Single-pass distributed queries are most efficient
when most of the documents found are returned in the search results, and they are
not efficient when most of the documents found will not be returned to the
coordinator node. For example, a distributed query that only fans out to a single node from the coordinator node will likely be most efficient as a single-pass query.
Single pass distributed queries for CQL are supported when the additional distrib.singlePass boolean parameter is included in the JSON query.
With single-pass queries, there is a limitation that only document fields that are defined in the Solr schema are returned as query results. This limitation also applies to map entries that do not conform to the dynamic field mapping.
JSON query name option
SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"query.name":"Asian subjects", "q":"ethnicity:Asia*"}' LIMIT 50;
JSON query commit option
If you are executing custom queries after bulk document loading, and the normal auto soft commit is disabled or extremely infrequent, and you want the latest data to be visible to your query, use the JSON query commit option to ensure that all pending updates are soft-committed before the query runs. By default, the commit option is set to false.
SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Asia*", "commit":true}' LIMIT 50;