Using search indexes
DSE Graph leverages DSE Search indexes to efficiently filter vertices by properties, and reducing query latency. DSE Search uses a modified Apache Solr to create the search indexes. Graph search indexes can be created using textual, numeric and geospatial data.
It is important to note that traversal queries with search predicates can be completed whether a search index exists or not. However, full graph scans will occur without a search index and performance will degrade severely as the graph grows, an unacceptable solution in a production environment. Create search indexes during schema creation before inserting data and querying the graph. Search indexes will only be created if DSE Search is started in conjunction with DSE Graph. If search indexes are used, the queries must be run on DSE Search nodes in the cluster.
In general, the traversal step will involve a vertex label and can include a property key and a particular property value.
In a traversal, the step following g.V()
is generally the step in which an index will be consulted.
If a mid-traversal V()
step is called, then an additional indexed step can be consulted to narrow the list of vertices that will be traversed.
In DSE 5.1 and later, textual search indexes are by default indexed in both tokenized (TextField
) and non-tokenized (StrField
) forms.
This means that all textual predicates (token, tokenPrefix
, tokenRegex
, eq, neq, regex, prefix) will be usable with all textual vertex properties created.
Practically, search indexes should be created using the asString()
method only in cases where there is absolutely no use for tokenization and text analysis, such as for inventory categories (silverware, shoes, clothing).
The asText()
method is used if searching tokenized text, such as long multi-sentence descriptions.
The query optimizer will choose whether to use analyzed or non-analyzed indexing based on the textual predicate used.
Prior to DSE 5.1, search indexes defaulted to |
Property key indexes defined with asText()
or undefined (since this is the default) can use the following options for search:
Property key indexes defined with asString()
can use the following options for search:
The |
In addition, in DSE 5.1 and later, fuzzy search predicates have been added:
Two of the predicates, fuzzy
and tokenFuzzy
, can be used with TextFields
and StrFields
, respectively, while phrase
can be used only with TextFields
.
- Creating a textual search index
-
An example search index from Creating indexes for vertex label
recipe
that will be used for all examples below:schema.vertexLabel('recipe').index('search').search(). by('instructions').asText(). by('name').asString().add()
This search index uses DSE Search to index
instructions
as full text using tokenization, andname
as a string. Note that, as of DSE 5.1, only those properties that specifically should be indexed as non-tokenized data must specifyasString()
. If there are proporties that specifically should be indexed only as tokenized data, specifyasText()
. - Search using
token()
methods on full text -
In a traversal query, use a token search to find list the names of all recipes that have the word
Saute
in the instructions. The methodtoken()
is used with a supplied word.g.V().has('recipe','instructions', token('Saute')).values('name')
Using search indexes part oneWhy does this search find these three recipes? Because the instructions for each meet the search requirements:
Using search indexes part 1.1 - Search using
tokenPrefix()
methods on full text -
In a traversal query, use a token prefix search to list the names of all recipes that have a word that includes a prefix of
Sea
in the instructions. The methodtokenPrefix()
is used with a supplied prefix (a set of alphanumeric characters).g.V().hasLabel('recipe').has('instructions', tokenPrefix('Sea')).values('name','instructions')
Using search indexes part twoTwo recipes are returned, one with the word Season in the instructions, and one with the word seasonings in the instructions. Case is insensitive in
tokenPrefix()
indexing. - Search using
tokenRegex()
methods on full text -
In a traversal query, use a token regular expression (regex) search to find all recipes that have a word that includes the regular expression specified. The regex,
.sea*in.
, looks for the letters sea preceded by any number of other characters and followed by any number of other characters until the letters in are found and also followed by any number of other characters in the instructions and list the recipe names. The methodtokenRegex()
is used with a supplied regex.g.V().hasLabel('recipe').has('instructions', tokenRegex('.*sea.*in.*')).values('name','instructions')
Using search indexes part threeNote that in this query, only the Oysters Rockefeller recipe is returned because the word Season in the Roast Pork Loin recipe does not meet the requirements for the regular expression.
- Search using
eq()
on non-token methods on strings -
-
In a traversal query, use a non-token search to list all recipes that have
Carrot Soup
in the recipe name. Note that this search is case-sensitive, so usingcarrot soup
would not find a vertex. The methodeq()
is used with a supplied name.g.V().hasLabel('recipe').has('name', eq('Carrot Soup')).values('name')
Using search indexes part fourThe match is found for the full author name listed. Note that
neq()
can also be used to find all strings that do not match the specified string. -
In a traversal query, use a non-token search to list all recipes that have
Carrot
in the recipe name. The methodeq()
is used with a supplied name.g.V().hasLabel('recipe').has('name', eq('Carrot')).valueMap()
Using search indexes part fiveNo match is found, because only a partial name was specified. For
asString()
indexes, the string must match.
-
- Search using
prefix()
on non-token methods on strings -
In a traversal query, use a non-token search to find all authors that have a name beginning with the letter
R
. The methodprefix()
is used with a supplied string.g.V().hasLabel('recipe').has('name', prefix('R')).values('name')
Using search indexes part sixMatches are found for each author name that begins with
R
, provided the recipe name was designated withasString()
in the search index. - Search using
regex()
on non-token methods on strings -
-
In a traversal query, use a non-token search to find all recipes that have a name that includes a specified regular expression. The method
regex()
is used with a supplied regex.g.V().hasLabel('recipe').has('name', regex('.*ee.*')).values('name')
Using search indexes part sevenMatches are found for each author name that include the regex
.ee.
to find all strings that include ee preceded and followed by any number of other characters, provided the recipe name was designated withasString()
in the search index.
-
- Search using phrase()
-
-
The
phrase()
predicate is used with properties designated as TextFields.Find the exact phrase Wild Mushroom Stroganoff in a recipe name:
g.V().hasLabel('recipe').has('name', phrase('Wild Mushroom Stroganoff',0))
The
0
designates that the result must be an exact phrase.v[{~label=recipe, community_id=2123369856, member_id=0}]
The vertex for the correct recipe is returned.
-
The
phrase()
predicate can be used for proximity searches, to discover phrases that have terms that are within a certain distance of one another in the tokenized text.g.V().hasLabel('recipe').has('name', phrase('Wild Stroganoff',1))
The value of
1
designates that the result must only have words in the recipe name that are one term away from one another. In this example, the variation is the addition of the word Mushroom.v[{~label=recipe, community_id=2123369856, member_id=0}]
The vertex for the correct recipe is returned. A match for
g.V().hasLabel('recipe').has('name', phrase('Wild Mushroom',1))
will also return the correct vertex, butg.V().hasLabel('recipe').has('name', `phrase('Mushroom Wild',1))
will not.
-
- Search using
fuzzy()
-
-
The
fuzzy()
predicate uses optimal string alignment distance calculations to match properties designated as StrFields. Variations in the letters used in words, such as misspellings, are the focus of this predicate. The edit distance specified refers to the number of transpositions of letters, with a single transposition of letters constituting one edit.Find the exact name of James Beard in an author name:
g.V().hasLabel('author').has('name', fuzzy('James Beard', 0)).values('name')
The
0
designates that the result must be an exact match.James Beard
-
Changing the last value in a
fuzzy()
predicate will find misspellings:g.V().hasLabel('author').has('name', fuzzy('James Beard', 1)).values('name')
The
1
designates that the result matches with an edit distance of at most one.James Beard, Jmaes Beard
If an author vertex exists with the misspelling Jmaes Beard, the query shown will find both vertices. The value of 1 finds this misspelling because of the single transposition of the letters a and m.
-
Note that searching for a misspelling will find the records with the correct spelling, as well as the misspelled name
g.V().hasLabel('author').has('name', fuzzy('Jmase Beard', 2)).values('name')
The
2
designates that the result must match with at most two transpositions.James Beard, Jmaes Beard
If an author vertex exists with the misspelling Jmaes Beard, the query shown will find both vertices. The value of 2 finds both the misspelling because of the single transposition of letters, e and s in Jmaes Beard, as well as the correct spelling with a second transposition of letters from Jmase Beard to James Beard .
Specifying an edit distance of 3 or greater matches too many terms for useful results. The resulting search index will be too large to efficiently filter queries.
-
- Search using
tokenFuzzy()
-
-
The
tokenFuzzy()
predicate similar tofuzzy()
, but searches for variation across individual tokens in analyzed textual data (TextFields).Find the recipe name that includes the word Wild while searching for the word with a one-letter misspelling:
g.V().hasLabel('recipe').has('name', tokenFuzzy('Wlid',1)).values('name')
The
1
designates that one letter misspelling (one transposition) is acceptable.Wild Beef Stroganoff
-
- Using two search indexes for a single traversal query
-
-
Create a second search index like an example search index from Creating indexes for vertex label
author
.schema.vertexLabel('author').index('search').search(). by('name').asString(). by('nickname').ifNotExists().add()
This search index will use DSE Search to index
nickname
as full text using tokenization, andname
as a string. -
This traversal query demonstrates a mid-traversal
V()
that allows a search index for author as well as a search index for recipe to be used to execute the query. The first index uses atokenRegex()
to find recipe instructions that start with the word Braise; this part of the query is labeled as r for use later in the query. Then the search index for author is searched for an author name that starts with the letter J, and traversed through an outgoing edge to a vertex where the search found in the first part of the query is found withwhere(eq('r'))
.g.V().has('recipe', 'instructions', tokenRegex('Braise.*')).as('r'). V().has('author', 'name', prefix('J')).out().where(eq('r')).values('name')
Using search indexes part eightThis query traversal finds the recipe
Beef Bourguignon
authored byJulia Child
, and illustrates some of the complexity that can be successfully used with search indexes.
-
- Search using geospatial values
-
-
Geospatial search is used to discover geospatial relationships. Search indexes are used to make such searches possible. First, a search index must be created.
schema.vertexLabel('FridgeSensor').index('search').search(). by('location').ifNotExists().add()
-
Some sample data will be helpful for understanding the search results. Two vertices are entered for fridge sensor:
graph.addVertex(label, 'FridgeSensor', 'name', 'jones1', 'city_id', 100, 'sensor_id', '60bcae02-f6e5-11e5-9ce9-5e5517507c66', 'location', Geo.point(-118.359770, 34.171221)) graph.addVertex(label, 'FridgeSensor', 'name', 'smith1', 'city_id', 100, 'sensor_id', '61deada0-3bb2-4d6d-a606-a44d963f03b5', 'location', Geo.point(-115.655068, 35.163427))
The sensors are named and given a city ID and sensor ID in addition to the location with data type
Point
. -
A query can find all sensors that meet the requirement of being inside the described polygon
Distance
that is designated as a circle with a center at (-110, 30) and a radius of 20 degrees with the methodGeo.inside()
.Distance d = Geo.point(-110,30),20, Geo.Unit.DEGREES) g.V().hasLabel('FridgeSensor').has('location', Geo.inside(d)).values('name')
Using search indexes part nineMore information on geospatial queries can be found in Geospatial traversals.
-
- Search using numerical values
-
-
Search indexes can also be used for non-textual values:
schema.propertyKey('name').Text().create() schema.propertyKey('age').Int().create() schema.vertexLabel('person').properties('name','age').create() schema.vertexLabel('person').index('search').search().by('name').by('age').add()
This example includes a search index by the integer property age. Here is data to query:
graph.addVertex(label, 'person','name','Julia','age',56) graph.addVertex(label, 'person','name','Emeril','age',48) graph.addVertex(label, 'person','name','Simone','age',50) graph.addVertex(label, 'person','name','James','age',52)
and the query itself:
g.V().has('person','age', gt(50)).values()
to find all persons over the age of 50.
==>Julia ==>56 ==>James ==>52
-
To sort the previous search, add additional methods:
g.V().hasLabel("person").has("age", gt(50)).order().by("age", incr).values()
to get:
==>James ==>52 ==>Julia ==>56
-