Search index syntax
DataStax Enterprise supports production-grade implementation of CQL Solr queries on a DSE Search node. Develop CQL applications supporting full-text search without using Solr-specific APIs.
dse.yaml
The location of the dse.yaml file depends on the type of installation:
Package installations |
/etc/dse/dse.yaml |
Tarball installations |
installation_location/resources/dse/conf/dse.yaml |
DataStax Enterprise supports a production-grade implementation of CQL Solr queries in DSE Search. You can develop CQL-centric applications supporting full-text search without having to work with Apache Solr™-specific APIs. Only full text search queries are supported.
- CQL Solr queries are defaulted to an equivalent of
LIMIT 10
. - Pagination is off by default. In dse.yaml, the cql_solr_query_paging option specifies when to use pagination (also called cursors).
- Apache Solr and Apache Lucene limitations for field names apply to pagination.
- Queries with smaller result sets will see increased performance with paging off.
- Limitations and known Apache Solr issues apply to DSE Search queries. For example: incorrect SORT results for tokenized text fields.
- Column aliases are not supported in solr_query queries.
-
All of the fields that are queried on DSE SearchAnalytics clusters must be defined in the search index schema definition. Fields that are not defined in the search index schema columns not defined are excluded in the results returned from Spark queries.
Search index query syntax
Execute queries against indexed columns using the solr_query
option
of the SELECT
statement WHERE
clause.
Synopsis
SELECT selectors
FROM table
WHERE solr_query = 'search_expression'
[ LIMIT n ] ;
- Basic search index queries using only a q parameter expression, see Writing a basic index query. The following examples show
expressions supported by the Solr q parameter. For improved performance, you should
use Solr filter queries (
fq
) whenever possible. See Search index filtering best practices. - Advanced search index queries using a full JSON expression.
solr_query
option to filter on the search index fields. For
example:SELECT *
FROM users
WHERE solr_query = 'irc:jdoe';
SELECT *
FROM users
WHERE irc = 'jdoe';
Writing a basic index query
SELECT *
FROM keyspace.table
WHERE solr_query = 'name: cat name: dog -name:fish';
SELECT name
FROM keyspace.table
WHERE solr_query = 'name:cat name:dog -name:fish';
SELECT *
FROM keyspace.table
WHERE solr_query = 'name:cat name:dog -name:fish'
LIMIT 1;
SELECT count(*)
FROM table
WHERE solr_query = '...';
Using count() in combination with LIMIT or facets results in an error.
All response queries of the drivers have a custom payload where the total number of documents found is returned. This number is keyed as DSESearch.numFound.
Writing advanced solr_query expressions
DSE Search supports filtering CQL queries using more advanced Solr searches with JSON-based expressions.
cassandra.yaml
The location of the cassandra.yaml file depends on the type of installation:
Package installations |
/etc/dse/cassandra/cassandra.yaml |
Tarball installations |
installation_location/resources/cassandra/conf/cassandra.yaml |
dse.yaml
The location of the dse.yaml file depends on the type of installation:
Package installations |
/etc/dse/dse.yaml |
Tarball installations |
installation_location/resources/dse/conf/dse.yaml |
On this page:
- JSON query syntax
- JSON queries with literal characters that are Apache Solr/Apache Lucene special characters
- Escaping single quotation marks
- Field, query, and range faceting with a JSON query
- Tracing distributed queries
- Solr single-pass CQL queries
- JSON query name option
- JSON query commit option
- Queries to dynamically enable paging
JSON query syntax
{ "q": query_expression (string), "fq": filter_query_expression(s) (string_or_array_of_strings, ...), "facet": facet_query_expression (object) "sort": sort_expression (string), "start": start_index(number), timeAllowed: search_time_limit_ms, "TZ": zoneID), // Any valid zone ID in java TimeZone class "paging": "driver" (string), "distrib.singlePass": true|false (boolean), "shards.failover": true|false (boolean), // Default: true "shards.tolerant": true|false (boolean), // Default: false "commit": true|false (boolean), "route.partition": partition_routing_expression (array_of_strings), "route.range": range_routing_expression (array_of_strings), "query.name": query_name (string), }
SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"q":"ethnicity:Asian"}';
SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Mexi*", "sort":"id asc"}' LIMIT 3;
SELECT * FROM mykeyspace.mytable WHERE solr_query='{"q" : "{!edismax}quotes:yearning or kills"}';
Making distributed queries tolerant of shard failures
Valid configurations | Description |
---|---|
|
This default configuration enables query failover and disables fault tolerance. Attempt to retry the failed shard requests when errors indicate that there is a reasonable chance of recovery. If any of the nodes (shards) that we scatter to fail before the query is complete, retry the shard query against a replica. |
|
Disable query failover. Enable fault tolerance. Make the query succeed, even if the query only partially succeeded, and did not succeed for all nodes. |
|
Disable query failover. Disable fault tolerance. |
Other fault tolerance configuration options include: netty_client_request_timeout in dse.yaml and read_request_timeout_in_ms in cassandra.yaml.
JSON queries with literal characters that are Apache Solr™/Apache Lucene® special characters
Lucene supports escaping special characters that are part of
the query syntax. Special characters are: +, -, &&, ||, !, (, ),
", ~, *, ?,
and :
. Using
JSON with solr_query
requires additional syntax for literal
characters that are Lucene special characters.
Simple search string | mytestuser1? |
Solr query | name:mytestuser1\? |
CQL Solr query | solr_query='{"q":"name:mytestuser1\\?"}' |
Complex search string | (1+1):2 |
Solr query | e:\(1\+1\)\:2 |
CQL Solr query | solr_query='{"q":"e:\\(1\\+1\\)\\:2"}' |
Field, query, and range faceting with a JSON query
- Specifying each facet parameter without the facet prefix that is required by HTTP APIs.
- Expressing multiple facet fields and queries inside a JSON array.
- Faceted search example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"field":"type"}}';
- Query facet example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":"type:0"}}';
- Multiple queries example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":["type:0","type:1"]}}';
- Distributed pivot faceting example
-
SELECT id FROM table WHERE solr_query='{"q":"id:*","facet":{"pivot":"type,value","limit":"10"}}'
- Range facet example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"range":"type", "f.type.range.start":-10, "f.type.range.end":10, "range.gap":1}}}';
The returned result is formatted as a single row with each column corresponding to the output of a facet (either field, query, or range). The value is represented as a JSON blob because facet results can be complex and nested. For example:facet_fields | facet_queries ------------------------+------------------------- {"type":{"0":2,"1":1}} | {"type:0":2,"type:1":1}
- Range by date facet example
-
SELECT * FROM solr WHERE solr_query='{"q":"business_date:*","facet":{"range":"business_date", "f.business_date.range.start":"2015-01-01T00:00:00Z", "f.business_date.range.end":"2015-08-01T00:00:00Z", "f.business_date.range.gap":"+1MONTH"}}';
- Interval facet example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"interval":"id", "interval.set":"[*,500]"}';
Tracing distributed queries
- In the shards.info response for HTTP queries.
- In the system_traces.events table for HTTP queries that provide cassandra.trace=true and CQL Solr queries that enable tracing at the driver level.
Solr single-pass CQL queries
Single-pass distributed queries are supported in CQL Solr queries.
- Specify the
distrib.singlePass
Boolean parameter in the query expression. Example:SELECT * FROM keyspace.table WHERE solr_query = '{"q" : "*:*", "distrib.singlePass" : true}'
- Use a
token()
or partition key restriction in theWHERE
clause. Example:SELECT * FROM keyspace.table WHERE token(id) >= 3074457345618258604 AND solr_query = '{"q" : "*:*"}'
- Execute a
COUNT
query. Example:SELECT count(*) FROM keyspace.table WHERE solr_query = '{"q" : "*:*"}'
- Specify the primary key elements in the
SELECT
clause. Example:CREATE TABLE cycling.cyclist_name ( id UUID, lastname text, firstname text ); PRIMARY KEY ((id), lastname);
CREATE SEARCH INDEX ON cycling.cyclist_name ... ;
dsetool create_core cycling.cyclist_name ...
INSERT into cycling.cyclist_name ... ;
SELECT id, lastname FROM cycling.cyclist_name WHERE solr_query='{"q": "*:*","fq":"(id:a6f94417-e27b-444f-9d0c-dccb588e421f)"}';
- The first one to retrieve IDs from DSE Search that satisfy the query.
- The second one to retrieve only the rows that satisfy the query from the database, based on IDs from the first step.
For example, a distributed query that only fans out to a single node from the coordinator node will likely be most efficient as a single-pass query.
stored=true
must be included for the field to be
retrievable by queries. cycling.cyclist_name
table example above. If the column firstname
was not indexed, but
needed to be retrievable by queries, the entry in the schema.xml
for the firstname
field could be:
<field indexed="false" multiValued="false"
docValues="false" name="firstname" stored="true" type="StrField"/>
If the Solr core was created without a customized schema.xml file
and generateResources=true
was included, the default
schema.xml file includes all fields that exist in the table.
For more information, see dsetool create_core. JSON query name option
SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"query.name":"Asian subjects", "q":"ethnicity:Asia*"}' LIMIT 50;
JSON query commit option
If you are executing custom queries after bulk document loading, and the auto soft commit is disabled or the configured value is extremely infrequent, and you want the latest data to be visible to your query, use the JSON query commit option to ensure that all pending updates are soft-committed before the query runs. By default, the commit option is set to false.
SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Asia*", "commit":true}' LIMIT 50;
commit
option for live
operations against a production cluster. DataStax recommends using the JSON
commit
option only when you would otherwise be forced to issue
a commit though the Solr HTTP interface. The commit
option is not a
replacement for the normal auto soft commit process. Queries to dynamically enable paging
"paging":"driver"
parameter:select id from wiki.solr where solr_query='{"q":"*", "sort":"id asc", "paging":"driver"}';
Limit queries by time
- If the timeAllowed is exceeded, an exception is thrown.
- If the timeAllowed is exceeded, and the additional shards.tolerant parameter is set to
true, the application returns the partial results collected so far.
When partial results are returned, the CQL custom payload contains the
DSESearch.isPartialResults
key.
http://localhost:8983/solr/demo.solr/select?q=ipod&timeAllowed=30000
http://localhost:8983/solr/demo.solr/select?q=ipod&timeAllowed=30000
SELECT * FROM users where solr_query = '{ "q": "*:*", "timeAllowed":30000}';
Escaping characters in a solr_query
How to escape special characters in a basic or advanced solr_query
expression.
Solr queries require escaping special characters that are part of
the query syntax. Special characters are: +, -, &&, ||, !, (, ),
", ~, *, ?,
and :
. To escape these characters,
use a slash (\
) before the character to escape. For example, to search for a
literal double quotation mark ("
) character, escape the "
for Solr with \"
.
solr_query
you can escape special characters in two forms:- CQL Solr
...WHERE solr_query='field:value'
- JSON
WHERE solr_query='{ "q": "field:value"}'
JSON-encoded queries require that values must also be JSON-escaped for special characters.
\\\
:- For query syntax: One slash
\
to escape the"
- For the JSON string syntax: Two slashes
\\
to escape the\
Triple slashes
\\\
escape both characters in\"
to produce\\
(an escaped escape) and\"
(an escaped double quote).
Escaping single quotation marks
- Double the single quotation mark (
'
)- CQL
...WHERE solr_query='name:Walter''s'
- JSON
...WHERE solr_query='{ "q": "Walter''s"}'
- Use dollar-quotes for the string constant
- CQL
...WHERE solr_query=$$name:Walter's$$
- JSON
...WHERE solr_query=$${ "q": "Walter's"}$$
Query examples for escaping double quotation marks
- CQL
- Double the single quotation mark (
'
) and add the backslash (\
) for Solr escaping - JSON
- Escape
\"
to\\\"
to escape both special characters for JSON
Exact and fuzzy query examples
- Exact phrase query
- For a row that looks like this, with an email address that includes a double
quotation mark
greenr"q@example.com
:INSERT INTO users(id, email) VALUES(1, 'greenr"q@example.com')"
- Fuzzy query
- For a row that looks like this, with the same email address that includes a double
quotation mark
greenr"q@example.com
:select * from test.users where solr_query='{"q":"email:r\\\"q@example"}' ; id | email | solr_query ------+-------+------------------------------ 1 | greenr"q@example.com | null (1 rows)
Using
JSON with solr_query
requires additional syntax for literal
characters that are Lucene special characters. See JSON
queries with literal characters that are Solr special characters.