Search index syntax
Describes the query syntax for indexed fields on DSE Search nodes.
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
DataStax Enterprise Search nodes can perform a wide range of functionality, and it's
important to understand the syntax required for such tasks as filtering CQL queries,
using the solr_query
JSON formats, and escaping special characters.
Filtering synopsis
In a CQL SELECT statement, you can filter results using the following syntax:
SELECT selectors
FROM table
WHERE (indexed_column_expression | solr_query = 'search_expression')
[ LIMIT n ]
[ ORDER BY column_name ] ;
- Full text search using column names and CQL operators. See Filtering on search indexed columns.
- Recommended for most queries: Native CQL query expressions.
- Still supported: the Solr syntax in the
solr_query
option:- Basic expressions (Solr q parameter) in plain
text, see Filtering queries with the solr_query q format. For improved
performance, you should use Solr filter queries (
fq
) whenever possible. See Search index filtering best practices. - Advanced expressions using Solr JSON syntax, see Using the solr_query JSON format.
- Basic expressions (Solr q parameter) in plain
text, see Filtering queries with the solr_query q format. For improved
performance, you should use Solr filter queries (
Filtering restrictions and best practices
- Search index queries are defaulted to an equivalent of
LIMIT 10
. - Pagination is off by default. Note: In dse.yaml, the cql_solr_query_paging option specifies when to use pagination (also called cursors).
- Apache Solr and Apache Lucene limitations for field names apply to pagination.
- Queries with smaller result sets will see increased performance with paging off.
- Limitations and known Apache Solr issues apply to DSE Search queries. For example: incorrect SORT results for tokenized text fields.
- Column aliases are not supported in search index queries.
- All of the fields that are queried on DSE SearchAnalytics clusters must be defined in the search index schema definition. Fields that are not defined in the search index schema columns are excluded from the results returned from Spark queries.
- Solr allows the
LOCAL_ONE
andONE
consistency levels for read operations.
Filtering on search indexed columns
On a DSE Search node, filter queries using the search index columns.
On a DataStax Enterprise search node, when a query is filtered on
an indexed column the corresponding Solr document is used. Filtering on indexed columns in
queries that do not have the partition key fully defined do not require ALLOW
FILTERING
.
The search index now supports CQL operators, such as LIKE
,
IS NOT NULL
, range, and =/!=
to filter. See Native CQL search queries.
SELECT * FROM test.test WHERE title IS NOT NULL AND collection CONTAINS 'Anaisa Pye' AND age > 35;
Sort using ORDER BY.
Filtering queries with the solr_query q format
Filter queries using the search index with Solr q parameter syntax in plain text.
solr_query
option using the syntax of the Solr q parameter in plain text. See Filtering on terms for
examples.solr_query = 'q_expression'
Examples
Filtering on a single column
name
column contains
cat
, dog
but not
fish
:SELECT * FROM pets WHERE solr_query='name: cat name: dog -name:fish';
Limiting results
name
column:SELECT name FROM keyspace.table WHERE solr_query='name:cat name:dog -name:fish'
SELECT * FROM keyspace.table WHERE solr_query='name:cat name:dog -name:fish' LIMIT 1
Using the count function
SELECT count(*) FROM table WHERE solr_query = '...';
Using count() in combination with LIMIT or facets results in an error.
All response queries of the drivers have a custom payload where the total number of
documents found is returned. This number is keyed as DSESearch.numFound
.
Using the solr_query JSON format
Apply advanced filters to CQL queries using JSON-formatted expressions.
cassandra.yaml
The location of the cassandra.yaml file depends on the type of installation:Package installations | /etc/dse/cassandra/cassandra.yaml |
Tarball installations | installation_location/resources/cassandra/conf/cassandra.yaml |
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
DSE Search supports advanced Solr search features to apply filters to CQL queries using JSON-formatted expressions. Also see Overriding the default TimeZone (UTC) in search queries.
JSON query syntax
{ "q": query_expression (string), "fq": filter_query_expression(s) (string_or_array_of_strings, ...), "facet": facet_query_expression (object) "sort": sort_expression (string), "start": start_index(number), timeAllowed: search_time_limit_ms, "TZ": zoneID), // Any valid zone ID in java TimeZone class "paging": "driver" (string), "distrib.singlePass": true|false (boolean), "shards.failover": true|false (boolean), // Default: true "shards.tolerant": true|false (boolean), // Default: false "commit": true|false (boolean), "route.partition": partition_routing_expression (array_of_strings), "route.range": range_routing_expression (array_of_strings), "query.name": query_name (string), }
SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Asian"}';
SELECT id FROM nhanes_ks.nhanes WHERE solr_query='{"q":"ethnicity:Mexi*", "sort":"id asc"}' LIMIT 3;
SELECT * FROM mykeyspace.mytable WHERE solr_query='{"q":"{!edismax}quotes:yearning or kills"}';
solr_query
, you must include
defaultSearchField
in your schema.Making distributed queries tolerant of shard failures
Valid configurations | Description |
---|---|
|
This default configuration enables query failover and disables fault tolerance. Attempt to retry the failed shard requests when errors indicate that there is a reasonable chance of recovery. |
|
Disable query failover. Enable fault tolerance. Make the query succeed, even if the query only partially succeeded, and did not succeed for all nodes. |
|
Disable query failover. Disable fault tolerance. |
Other fault tolerance configuration options include: netty_client_request_timeout in dse.yaml and read_request_timeout_in_ms in cassandra.yaml.
JSON queries with literal characters that are Apache Solr™/Apache Lucene® special characters
Character | Description |
---|---|
+ | Plus. Required search term operator. |
- | Minus. Prohibited search term operator. |
&& | Double ampersand. AND operator. Both terms either side of the operator are required for a match. |
|| | Double pipe. OR operator. |
! | Exclamation mark. NOT operator. |
( | Left parenthesis |
) | Right parenthesis |
" | Double quote |
~ | Tilde |
* | Asterisk |
? | Question mark |
: | Colon |
Using JSON with
solr_query
requires additional syntax for literal
characters that are Lucene special characters.
Simple search string | mytestuser1? |
Solr query | name:mytestuser1\? |
CQL Solr query | solr_query='{"q":"name:mytestuser1\\?"}' |
Complex search string | (1+1):2 |
Solr query | e:\(1\+1\)\:2 |
CQL Solr query | solr_query='{"q":"e:\\(1\\+1\\)\\:2"}' |
Field, query, and range faceting with a JSON query
- Specifying each facet parameter without the facet prefix that is required by HTTP APIs.
- Expressing multiple facet fields and queries inside a JSON array.
- Faceted search example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"field":"type"}}';
- Query facet example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":"type:0"}}';
- Multiple queries example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"query":["type:0","type:1"]}}';
- Distributed pivot faceting example
-
SELECT id FROM table WHERE solr_query='{"q":"id:*","facet":{"pivot":"type,value","limit":"10"}}'
- Range facet example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"range":"type", "f.type.range.start":-10, "f.type.range.end":10, "range.gap":1}}}';
The returned result is formatted as a single row with each column corresponding to the output of a facet (either field, query, or range). The value is represented as a JSON blob because facet results can be complex and nested. For example:facet_fields | facet_queries ------------------------+------------------------- {"type":{"0":2,"1":1}} | {"type:0":2,"type:1":1}
- Range by date facet example
-
SELECT * FROM solr WHERE solr_query='{"q":"business_date:*","facet":{"range":"business_date", "f.business_date.range.start":"2015-01-01T00:00:00Z", "f.business_date.range.end":"2015-08-01T00:00:00Z", "f.business_date.range.gap":"+1MONTH"}}';
- Interval facet example
-
SELECT * FROM solr WHERE solr_query='{"q":"id:*","facet":{"interval":"id", "interval.set":"[*,500]"}';
Tracing distributed queries
- In the
shards.info
response for HTTP queries. - In the
system_traces.events
table for HTTP queries that provide cassandra.trace=true and CQL Solr queries that enable tracing at the driver level.
Solr single-pass CQL queries
Single-pass distributed queries are supported in CQL Solr queries.
- Specify the
distrib.singlePass
Boolean parameter in the query expression. Example:SELECT * FROM keyspace.table WHERE solr_query = '{"q" : "*:*", "distrib.singlePass" : true}'
- Use a
token()
or partition key restriction in theWHERE
clause. Example:SELECT * FROM keyspace.table WHERE token(id) >= 3074457345618258604 AND solr_query = '{"q" : "*:*"}'
- Execute a
COUNT
query. Example:SELECT count(*) FROM keyspace.table WHERE solr_query = '{"q" : "*:*"}'
- Specify the primary key elements in the
SELECT
clause. Example:CREATE TABLE cycling.cyclist_name ( id UUID, lastname text, firstname text ); PRIMARY KEY ((id), lastname);
CREATE SEARCH INDEX ON cycling.cyclist_name ... ;
dsetool create_core cycling.cyclist_name ...
INSERT into cycling.cyclist_name ... ;
SELECT id, lastname FROM cycling.cyclist_name WHERE solr_query='{"q": "*:*","fq":"(id:a6f94417-e27b-444f-9d0c-dccb588e421f)"}';
- The first one to retrieve IDs from DSE Search that satisfy the query.
- The second one to retrieve only the rows that satisfy the query from the database, based on IDs from the first step.
For example, a distributed query that only fans out to a single node from the coordinator node will likely be most efficient as a single-pass query.
cycling.cyclist_name
table example above. If the column
firstname
was not indexed, but needed to be retrievable by queries,
the entry in the schema.xml for the firstname
field could be:
<field indexed="false" multiValued="false" docValues="false" name="firstname" type="StrField"/>
If the Solr core was created without a customized schema.xml file
and generateResources=true
was included, the default
schema.xml file includes all fields that exist in the table.
For more information, see dsetool
create_core.JSON query name option
SELECT id FROM nhanes_ks.nhanes WHERE solr_query=' {"query.name":"Asian subjects", "q":"ethnicity:Asia*"}' LIMIT 50;
JSON query commit option
When executing custom queries after bulk document loading, with auto soft commit disabled (or is an extremely infrequent configured value), use the JSON query commit option to ensure that all pending updates are soft-committed before the running a query. The commit makes the latest data visible to a query.
commit
option is not a replacement for the
normal auto soft commit process or the COMMIT SEARCH INDEX
command.commit
option to
true
:SELECT * FROM wiki.solr WHERE solr_query='{"q":"title:Asia*", "commit":true}' LIMIT 50;
Queries that dynamically enable paging using a driver
off
in dse.yaml, set the Solr paging
parameter to driver ("paging":"driver"
). For
example:SELECT id FROM wiki.solr WHERE solr_query= '{"q":"*", "sort":"id asc", "paging":"driver"}' ;
Limiting queries by time
- If
timeAllowed
is exceeded, an exception is thrown. - If
timeAllowed
is exceeded, and the additionalshards.tolerant
parameter is set totrue
, the application returns the partial results collected so far.
When partial results are returned, the CQL custom payload contains the
DSESearch.isPartialResults
key.
SELECT * FROM users
WHERE solr_query = '{ "q": "*:*", "timeAllowed":30000}';
The
Solr timeAllowed
parameter is enabled by default to prevent long
running shard queries, such as complex facets and Boolean queries, from using system
resources after they have timed out from the DSE Search coordinator.solr.timeAllowed.docsPerSample
has been removed.By default, the value for timeAllowed
is the same as the
internode_messaging_options.client_request_timeout_seconds
parameter in dse.yaml
. Queries that breach
client_request_timeout_seconds
fail by default.
The 50th percentile latency for queries using timeAllowed
should be
within 5% of the same query that does not use timeAllowed
.
The timeAllowed
parameter applies to these queries:
- Standard queries and filtering queries, where the timeout is applied on a match collection.
- Facet queries, including pivot facets.
The json.facet
queries are not supported.
Queries that are terminated due to a timeout are logged with a warning.
timeAllowed
parameter may cause a latency cost. If you find
the cost for queries is too high in your environment, consider setting the
-Ddse.timeAllowed.enabled.default
property to
false
at DSE startup time. Or set
timeAllowed.enable
to false in the query.Escaping characters in a solr_query
How to escape special characters in a basic or advanced solr_query
expression.
Solr queries require escaping special characters that are part of the query syntax.
Character | Description |
---|---|
+ | Plus. Required search term operator. |
- | Minus. Prohibited search term operator. |
&& | Double ampersand. AND operator. Both terms either side of the operator are required for a match. |
|| | Double pipe. OR operator. |
! | Exclamation mark. NOT operator. |
( | Left parenthesis |
) | Right parenthesis |
" | Double quote |
~ | Tilde |
* | Asterisk |
? | Question mark |
: | Colon |
To escape these characters, use a slash (\
) before the character to escape.
For example, to search for a literal double quotation mark ("
) character,
escape the "
for Solr with \"
.
solr_query
, you can escape special characters using either CQL
Solr or JSON forms:- CQL Solr
...WHERE solr_query='field:value'
- JSON
WHERE solr_query='{ "q": "field:value"}'
JSON-encoded queries require that values must also be JSON-escaped for special characters.
\\\
:- For query syntax: One slash
\
to escape the"
- For the JSON string syntax: Two slashes
\\
to escape the\
Triple slashes
\\\
escape both characters in\"
to produce\\
(an escaped escape) and\"
(an escaped double quote).
Escaping single quotation marks
- Double the single quotation mark (
'
)- CQL
...WHERE solr_query='name:Walter''s'
- JSON
...WHERE solr_query='{ "q": "Walter''s"}'
- Use dollar-quotes for the string constant
- CQL
...WHERE solr_query=$$name:Walter's$$
- JSON
...WHERE solr_query=$${ "q": "Walter's"}$$
Escaping double quotation marks
- CQL
- Double the single quotation mark (
'
) and add the backslash (\
) for Solr escaping - JSON
- Escape
\"
to\\\"
to escape both special characters for JSON
Exact and fuzzy query examples
- Exact phrase query
- For a row that looks like this, with an email address that includes a double
quotation mark
greenr"q@example.com
:INSERT INTO users(id, email) VALUES(1, 'greenr"q@example.com')"
- Fuzzy query
- For a row that looks like this, with the same email address that includes a double
quotation mark
greenr"q@example.com
:select * from test.users where solr_query='{"q":"email:r\\\"q@example"}' ; id | email | solr_query ------+-------+------------------------------ 1 | greenr"q@example.com | null (1 rows)
Using JSON with
solr_query
requires additional syntax for literal
characters that are Lucene special characters. See JSON queries
with literal characters that are Solr special characters.