Using search indexes
Using search indexes for graph traversals.
DSE Graph leverages DSE Search indexes to efficiently filter vertices by properties, and reducing query latency. DSE Search uses a modified Apache Solr to create the search indexes. Graph search indexes can be created using textual, numeric and geospatial data.
It is important to note that traversal queries with search predicates can be completed whether a search index exists or not. However, full graph scans will occur without a search index and performance will degrade severely as the graph grows, an unacceptable solution in a production environment. Create search indexes during schema creation before inserting data and querying the graph. Search indexes will only be created if DSE Search is started in conjunction with DSE Graph. If search indexes are used, the queries must be run on DSE Search nodes in the cluster.
In general, the traversal step will involve a vertex label and can include a property
key and a particular property value. In a traversal, the step following
g.V()
is generally the step in which an index will be
consulted. If a mid-traversal V()
step is called, then an
additional indexed step can be consulted to narrow the list of vertices that will be
traversed.
Textual search indexes are by default indexed in both tokenized
(TextField) and non-tokenized (StrField) forms. This means that all textual
predicates (token, tokenPrefix, tokenRegex, eq, neq, regex, prefix) will be usable
with all textual vertex properties indexed. Practically, search indexes should be
created using the asString()
method only in cases where there is
absolutely no use for tokenization and text analysis, such as for inventory
categories (silverware, shoes, clothing). The asText()
method is
used if searching tokenized text, such as long multi-sentence descriptions. The
query optimizer will choose whether to use analyzed or non-analyzed indexing based
on the textual predicate used.
asText()
or undefined (since this
is the default) can use the following options for search:asString()
can use the following
options for search:eq()
search cannot be used with property key indexes
created with asText()
because they contain tokenized data and
are therefore not suitable for exact text matches.fuzzy
and tokenFuzzy
,
can be used with TextFields
and StrFields
,
respectively, while phrase
can be used only with
TextFields
.Procedure
-
An example search index from Creating
indexes for vertex label
recipe
that will be used for all examples below:schema.vertexLabel('recipe').index('search').search(). by('instructions').asText(). by('name').asString().add()
This search index uses DSE Search to index
instructions
as full text using tokenization, andname
as a string. Note that, as of DSE 5.1, only those properties that specifically should be indexed as non-tokenized data must specifyasString()
. If there are proporties that specifically should be indexed only as tokenized data, specifyasText()
.
-
In a traversal query, use a token search to find list the names of all recipes
that have the word
Saute
in the instructions. The methodtoken()
is used with a supplied word.g.V().has('recipe','instructions', token('Saute')).values('name')
Why does this search find these three recipes? Because the instructions for each meet the search requirements:
-
In a traversal query, use a token prefix search to list the names of all
recipes that have a word that includes a prefix of
Sea
in the instructions. The methodtokenPrefix()
is used with a supplied prefix (a set of alphanumeric characters).g.V().hasLabel('recipe').has('instructions', tokenPrefix('Sea')).values('name','instructions')
Two recipes are returned, one with the word Season in the instructions, and one with the word seasonings in the instructions. Case is insensitive in
tokenPrefix()
indexing.
-
In a traversal query, use a token regular expression (regex) search to find all
recipes that have a word that includes the regular expression specified. The
regex,
.*sea*in.*
, looks for the letters sea preceded by any number of other characters and followed by any number of other characters until the letters in are found and also followed by any number of other characters in the instructions and list the recipe names. The methodtokenRegex()
is used with a supplied regex.g.V().hasLabel('recipe').has('instructions', tokenRegex('.*sea.*in.*')).values('name','instructions')
Note that in this query, only the Oysters Rockefeller recipe is returned because the word Season in the Roast Pork Loin recipe does not meet the requirements for the regular expression.
-
In a traversal query, use a non-token search to list all recipes that have
Carrot Soup
in the recipe name. Note that this search is case-sensitive, so usingcarrot soup
would not find a vertex. The methodeq()
is used with a supplied name.g.V().hasLabel('recipe').has('name', eq('Carrot Soup')).values('name')
The match is found for the full author name listed. Note that
neq()
can also be used to find all strings that do not match the specified string. -
In a traversal query, use a non-token search to list all recipes that have
Carrot
in the recipe name. The methodeq()
is used with a supplied name.g.V().hasLabel('recipe').has('name', eq('Carrot')).valueMap()
No match is found, because only a partial name was specified. For
asString()
indexes, the string must match.
-
In a traversal query, use a non-token search to find all authors that have a
name beginning with the letter
R
. The methodprefix()
is used with a supplied string.g.V().hasLabel('recipe').has('name', prefix('R')).values('name')
Matches are found for each author name that begins with
R
, provided the recipe name was designated withasString()
in the search index.
-
In a traversal query, use a non-token search to find all recipes that have a
name that includes a specified regular expression. The method
regex()
is used with a supplied regex.g.V().hasLabel('recipe').has('name', regex('.*ee.*')).values('name')
Matches are found for each author name that include the regex
.*ee.*
to find all strings that include ee preceded and followed by any number of other characters, provided the recipe name was designated withasString()
in the search index.
-
The
phrase()
predicate is used with properties designated as TextFields.Find the exact phrase Wild Mushroom Stroganoff in a recipe name:
Theg.V().hasLabel('recipe').has('name', phrase('Wild Mushroom Stroganoff',0))
0
designates that the result must be an exact phrase.
The vertex for the correct recipe is returned.v[{~label=recipe, community_id=2123369856, member_id=0}]
-
The
phrase()
predicate can be used for proximity searches, to discover phrases that have terms that are within a certain distance of one another in the tokenized text.
The value ofg.V().hasLabel('recipe').has('name', phrase('Wild Stroganoff',1))
1
designates that the result must only have words in the recipe name that are one term away from one another. In this example, the variation is the addition of the word Mushroom.
The vertex for the correct recipe is returned. A match forv[{~label=recipe, community_id=2123369856, member_id=0}]
g.V().hasLabel('recipe').has('name', phrase('Wild Mushroom',1))
will also return the correct vertex, butg.V().hasLabel('recipe').has('name', phrase('Mushroom Wild',1))
will not.
-
The
fuzzy()
predicate uses optimal string alignment distance calculations to match properties designated as StrFields. Variations in the letters used in words, such as misspellings, are the focus of this predicate. The edit distance specified refers to the number of transpositions of letters, with a single transposition of letters constituting one edit.Find the exact name of James Beard in an author name:
Theg.V().hasLabel('author').has('name', fuzzy('James Beard', 0)).values('name')
0
designates that the result must be an exact match.James Beard
-
Changing the last value in a
fuzzy()
predicate will find misspellings:
Theg.V().hasLabel('author').has('name', fuzzy('James Beard', 1)).values('name')
1
designates that the result matches with an edit distance of at most one.
If an author vertex exists with the misspelling Jmaes Beard, the query shown will find both vertices. The value of 1 finds this misspelling because of the single transposition of the letters a and m.James Beard, Jmaes Beard
-
Note that searching for a misspelling will find the records with the correct
spelling, as well as the misspelled name
Theg.V().hasLabel('author').has('name', fuzzy('Jmase Beard', 2)).values('name')
2
designates that the result must match with at most two transpositions.
If an author vertex exists with the misspelling Jmaes Beard, the query shown will find both vertices. The value of 2 finds both the misspelling because of the single transposition of letters, e and s in Jmaes Beard, as well as the correct spelling with a second transposition of letters from Jmase Beard to James Beard .James Beard, Jmaes Beard
CAUTION: Specifying an edit distance of 3 or greater matches too many terms for useful results. The resulting search index will be too large to efficiently filter queries.
-
The
tokenFuzzy()
predicate similar tofuzzy()
, but searches for variation across individual tokens in analyzed textual data (TextFields).Find the recipe name that includes the word Wild while searching for the word with a one-letter misspelling:
Theg.V().hasLabel('recipe').has('name', tokenFuzzy('Wlid',1)).values('name')
1
designates that one letter misspelling (one transposition) is acceptable.Wild Beef Stroganoff
-
Create a second search index like an example search index from Creating indexes for vertex label
author
.schema.vertexLabel('author').index('search').search(). by('name').asString(). by('nickname').ifNotExists().add()
This search index will use DSE Search to index
nickname
as full text using tokenization, andname
as a string. -
This traversal query demonstrates a mid-traversal
V()
that allows a search index for author as well as a search index for recipe to be used to execute the query. The first index uses atokenRegex()
to find recipe instructions that start with the word Braise; this part of the query is labeled as r for use later in the query. Then the search index for author is searched for an author name that starts with the letter J, and traversed through an outgoing edge to a vertex where the search found in the first part of the query is found withwhere(eq('r'))
.g.V().has('recipe', 'instructions', tokenRegex('Braise.*')).as('r'). V().has('author', 'name', prefix('J')).out().where(eq('r')).values('name')
This query traversal finds the recipe
Beef Bourguignon
authored byJulia Child
, and illustrates some of the complexity that can be successfully used with search indexes.
-
Geospatial search is used to discover geospatial relationships. Search indexes
are used to make such searches possible. First, a search index must be
created.
schema.vertexLabel('FridgeSensor').index('search').search(). by('location').ifNotExists().add()
-
Some sample data will be helpful for understanding the search results. Two
vertices are entered for fridge sensor:
graph.addVertex(label, 'FridgeSensor', 'name', 'jones1', 'city_id', 100, 'sensor_id', '60bcae02-f6e5-11e5-9ce9-5e5517507c66', 'location', Geo.point(-118.359770, 34.171221)) graph.addVertex(label, 'FridgeSensor', 'name', 'smith1', 'city_id', 100, 'sensor_id', '61deada0-3bb2-4d6d-a606-a44d963f03b5', 'location', Geo.point(-115.655068, 35.163427))
The sensors are named and given a city ID and sensor ID in addition to the location with data type
Point
. -
A query can find all sensors that meet the requirement of being inside the
described polygon
Distance
that is designated as a circle with a center at (-110, 30) and a radius of 20 degrees with the methodGeo.inside()
.Distance d = Geo.point(-110,30),20, Geo.Unit.DEGREES) g.V().hasLabel('FridgeSensor').has('location', Geo.inside(d)).values('name')
More information on geospatial queries can be found in Geospatial traversals.
-
Search indexes can also be used for non-textual values:
This example includes a search index by the integer property age. Here is data to query:schema.propertyKey('name').Text().create() schema.propertyKey('age').Int().create() schema.vertexLabel('person').properties('name','age').create() schema.vertexLabel('person').index('search').search().by('name').by('age').add()
and the query itself:graph.addVertex(label, 'person','name','Julia','age',56) graph.addVertex(label, 'person','name','Emeril','age',48) graph.addVertex(label, 'person','name','Simone','age',50) graph.addVertex(label, 'person','name','James','age',52)
to find all persons over the age of 50.g.V().has('person','age', gt(50)).values()
==>Julia ==>56 ==>James ==>52
-
To sort the previous search, add additional methods:
g.V().hasLabel("person").has("age", gt(50)).order().by("age", incr).values()
to get:==>James ==>52 ==>Julia ==>56