Search queries with JSON

DataStax Enterprise supports JSON-based query expressions.

See these sections for information on:

JSON query syntax

The JSON query expression syntax is a JSON string. The JSON-based query expression supports local parameters in addition to the following parameters:

{
  "fq": filter_query_expression(s) (string_or_array_of_strings),  
  "sort": sort_expression (string), 
  "start": start_index(number),  
  "tz": zoneID,  // Any valid zone ID in java TimeZone class  
  "facet": facet_query_expression (object),
  "commit": true|false (boolean),"q": query_expression (string),
  "query.name": query_name (string)
  "paging": "driver" (string),
  "distrib.singlePass": true|false (boolean),
  "shards.failover": true|false (boolean),    // Default: true
  "shards.tolerant": true|false (boolean),    // Default: false
  "route.partition": partition_routing_expression (array_of_strings), 
  "route.range": range_routing_expression (array_of_strings), 
}

For example:

SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"q":"ethnicity:Asian"}';

SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Mexi*", "sort":"id asc"}' LIMIT 3;

SELECT * FROM mykeyspace.mysolr WHERE solr_query='{"q" : "{!edismax}quotes:yearning or kills"}';

Note: To use Solr Extended DisMax Query Parser (eDisMax) with solr_query, you must include defaultSearchField in your schema.

Making distributed queries tolerant of shard failures

Since distributed queries contact many shards, making queries more tolerant of shard failures ensures more successful completions.

Use shards.failover and shards.tolerant parameters to define query failover and tolerance of shard failures during JSON queries:

Valid configurations	Description
`"shards.failover": true, "shards.tolerant": false,`	This default configuration enables query failover and disables fault tolerance. Attempt to retry the failed shard requests when errors indicate that there is a reasonable chance of recovery. If any of the nodes (shards) that we scatter to fail before the query is complete, retry the shard query against a replica.
`"shards.failover": false, "shards.tolerant": true,`	Disable query failover. Enable fault tolerance. Make the query succeed, even if the query only partially succeeded, and did not succeed for all nodes.
`"shards.failover": false, "shards.tolerant": false,`	Disable query failover. Disable fault tolerance.

Failover and tolerance of partial results cannot coexist in the same query. Queries support enabling tolerance for only one parameter.

Other fault tolerance configuration options include: netty_client_request_timeout in dse.yaml and read_request_timeout_in_ms in cassandra.yaml.

Tracing distributed queries

During a distributed query, every node is responsible for a set of Cassandra token ranges. A shard is the node/ranges combination. The shard token range is reported:

In the shards.info response for HTTP queries.
In the system_traces.events table for HTTP queries that provide cassandra.trace=true and CQL Solr queries that enable tracing at the driver level.

JSON queries with literal characters that are Lucene/Solr special characters

Lucene supports escaping special characters that are part of the query syntax. Special characters are: +, -, &&, ||, !, (, ), ", ~, *, ?,and :. Using JSON with solr_query requires additional syntax for literal characters that are Lucene special characters.

Syntax for a simple search string:

Simple search string	`mytestuser1?`
Solr query	`name:mytestuser1\?`
CQL Solr Query	`solr_query='{"q":"name:mytestuser1\\?"}'`

Syntax for a complex search string:

Complex search string	`(1+1):2`
Solr query	`e:\(1\+1\)\:2`
CQL Solr Query	`solr_query='{"q":"e:\\(1\\+1\\)\\:2"}'`

Escape characters in queries

Solr queries require escaping special characters that are part of the query syntax. To escape these characters, use a slash (\) before the character to escape. For example, to search for a literal double quotation mark (") character, escape the " for Solr with \".

For queries that contain double quotation marks, use triple slashes \\\:

For query syntax: One slash \ to escape the "
For the JSON string syntax: Two slashes \\ to escape the \
Triple slashes \\\ escape both characters in \" to produce \\ (an escaped escape) and \" (an escaped double quote).

Query type Example

Exact phrase query

Query type	Example
Exact phrase query	For a row that looks like this, with an email address that includes a double quotation mark `greenr"q@example.com`: `INSERT INTO users(id, email) VALUES(1, 'greenr"q@example.com')"` Perform a phrase query to search for the email address that is enclosed in double quotation marks: `SELECT * FROM users where solr_query = ' { "q": ":", "fq": "email:\"greenr\\\"q@example.com\""} ';`
Fuzzy query	For a row that looks like this, with the same email address that includes a double quotation mark `greenr"q@example.com`: `cqlsh> select * from test.users where solr_query='{"q":"email:r\\\"q@example"}' ; id \| email \| solr_query ------+-------+------------------------------ 1 \| greenr"q@example.com \| null (1 rows)` For a term query (fuzzy search) for all email addresses that include `r"q@example`, remove the double quotation marks but retain triple quotation marks for the escaped double quotation character that is part of the email address: `SELECT * FROM users where solr_query = ' { "q": ":", "fq": "email:r\\\"q@example"} ';`

For a row that looks like this, with an email address that includes a double quotation mark greenr"q@example.com:

INSERT INTO users(id, email) VALUES(1, 'greenr"q@example.com')"

Perform a phrase query to search for the email address that is enclosed in double quotation marks:

SELECT * FROM users where solr_query = '
{ "q": "*:*", "fq": "email:\"greenr\\\"q@example.com\""}
';

Fuzzy query

For a row that looks like this, with the same email address that includes a double quotation mark greenr"q@example.com:

cqlsh> select * from test.users where solr_query='{"q":"email:r\\\"q@example"}' ;
 id    | email                 | solr_query
------+-------+------------------------------
 1     | greenr"q@example.com  | null
(1 rows)

For a term query (fuzzy search) for all email addresses that include r"q@example, remove the double quotation marks but retain triple quotation marks for the escaped double quotation character that is part of the email address:

SELECT * FROM users where solr_query = '
{ "q": "*:*", "fq": "email:r\\\"q@example"}
';

Overriding the default TimeZone (UTC) in search queries

Specify the TZ parameter to overwrite the default TimeZone (UTC) that is used for adding and rounding in date math. The local rules for the specified time zone, including the start and end of daylight saving time (DST) if any, determine when each arbitrary day starts. The time zone rules impact the rounding and adding of DAYs, but also cascades to rounding of HOUR, MIN, MONTH, and YEAR. For example, specifying a different time zone changes the result:

Date math	Result
`2016-03-10T12:34:56Z/YEAR`	Default TZ `2016-01-01T00:00:00Z`
	TZ=America/Los_Angeles `2016-01-01T08:00:00Z`
`2016-03-10T08:00:00Z+1DAY`	Default TZ `2016-03-11T08:00:00Z`
	TZ=America/Los_Angeles `2016-03-11T07:00:00Z`

The value of the TZ parameter can be any zone ID that is supported by the java TimeZone class.

Field, query, and range faceting with a JSON query

Specify the facet parameters inside a facet JSON object to perform field, query, and range faceting inside Solr queries. Distributed pivot faceting is supported. The query syntax is less verbose to specify facets by:

Specifying each facet parameter without the facet prefix that is required by HTTP APIs.
Expressing multiple facet fields and queries inside a JSON array.

Faceted search example

SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"field":"type"}}';

Query facet example

SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":"type:0"}}';

Multiple queries example

SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":["type:0","type:1"]}}';

Distributed pivot faceting example

SELECT id FROM table WHERE solr_query='{"q":"id:*","facet":{"pivot":"type,value","limit":"-1"}}'

Range facet example

SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"range":"type", "f.type.range.start":-10, "f.type.range.end":10, "range.gap":1}}}';

The returned result is formatted as a single row with each column corresponding to the output of a facet (either field, query, or range). The value is represented as a JSON blob because facet results can be complex and nested. For example:

facet_fields           | facet_queries
------------------------+-------------------------
 {"type":{"0":2,"1":1}} | {"type:0":2,"type:1":1}

Range by date facet example

SELECT * FROM solr WHERE solr_query='{"q":"business_date:*","facet":{"range":"business_date", "f.business_date.range.start":"2015-01-01T00:00:00Z", "f.business_date.range.end":"2015-08-01T00:00:00Z", "f.business_date.range.gap":"+1MONTH"}}';

Warning: Solr range facets before, after, and between might return incorrect and inconsistent results on multinode clusters. See SOLR-6187 and SOLR-6375.

Interval facet example

SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"interval":"id", "interval.set":"[*,500]"}';

JSON single-pass distributed queries

Single-pass distributed queries are supported in CQL Solr queries.

To use a single pass distributed query instead of the standard two-pass query, specify the distrib.singlePass Boolean parameter in the JSON query expression:

SELECT * FROM ks.cf WHERE solr_query = '{"q" : "*:*", "distrib.singlePass" : true}'

Using a single-pass distributed query has an operational cost that includes potentially more disk and network overhead. With single-pass queries, each node reads all rows that satisfy the query and returns them to the coordinator node. An advanced feature, a single-pass distributed query saves one network round trip transfer during the retrieval of queried rows. A regular distributed query performs two network round trips, the first one to retrieve IDs from Solr that satisfy the query and another trip to retrieve only the rows that satisfy the query from Cassandra, based on IDs from the first step. Single-pass distributed queries are most efficient when most of the documents found are returned in the search results, and they are not efficient when most of the documents found will not be returned to the coordinator node.

For example, a distributed query that only fans out to a single node from the coordinator node will likely be most efficient as a single-pass query.

Single pass distributed queries for CQL are supported when the additional distrib.singlePass boolean parameter is included in the JSON query.

With single-pass queries, there is a limitation that only document fields that are defined in the Solr schema are returned as query results. This limitation also applies to map entries that do not conform to the dynamic field mapping.

JSON query name option

Using the following syntax to name your queries to support metrics and monitoring for performance objects. Naming queries can be useful for tagging and JMX operations, for example.

SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"query.name":"Asian subjects", "q":"ethnicity:Asia*"}' LIMIT 50;

JSON query commit option

If you are executing custom queries after bulk document loading, and the normal auto soft commit is disabled or extremely infrequent, and you want the latest data to be visible to your query, use the JSON query commit option to ensure that all pending updates are soft-committed before the query runs. By default, the commit option is set to false.

For example:

SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Asia*", "commit":true}' LIMIT 50;

Warning: Do not use the JSON commit option for live operations against a production cluster. DataStax recommends using the JSON commit option only when you would otherwise be forced to issue a commit though the Solr HTTP interface. The commit option is not a replacement for the normal auto soft commit process.

Queries for tuples and UDTs

DSE Search supports indexing and querying of advanced data types, including tuples and user-defined types (UDT).

The tuple data type holds fixed-length sets of typed positional fields. Use a tuple as an alternative to a user-defined type.
A user-defined type (UDT) facilitates handling multiple fields of related information in a table. UDTs are a specialization of tuples. All examples and documentation references to tuples apply to both tuples and UDTs.
Applications that require multiple tables can be simplified to use fewer tables by using a user-defined type to represent the related fields of information instead of storing the information in a separate table.

Configuration and schema requirements apply. See UDT query examples.

Queries to dynamically enable paging

When the cql_solr_query_paging option is off in the dse.yaml file, use the paging:driver parameter to dynamically enable pagination.

cqlsh> select id from wiki.solr where solr_query='{"q":"*", "sort":"id asc", "paging":"driver"}';

The location of the dse.yaml file depends on the type of installation:

Installer-Services	/etc/dse/dse.yaml
Package installations	/etc/dse/dse.yaml
Installer-No Services	`install_location`/resources/dse/conf/dse.yaml
Tarball installations	`install_location`/resources/dse/conf/dse.yaml

The location of the cassandra.yaml file depends on the type of installation:

Package installations	/etc/dse/cassandra/cassandra.yaml
Tarball installations	`install_location`/resources/cassandra/conf/cassandra.yaml