Using search indexes
Using search indexes for graph traversals.
Basics of using search indexes
Basics of using search indexes
DataStax Graph (DSG) leverages DSE Search indexes to efficiently filter vertices and edges by properties, reducing query latency. DSE Search uses a modified Apache Solr to create the search indexes. Graph search indexes can be created using textual, numeric and geospatial data.
Using search indexes requires DSE Search as well as DSG to be running on the nodes in a cluster. It is important to know that search indexing operates on a per-datacenter basis, so if DSE Search and DSG are running in different datacenters, the behavior of indexing may not match what is expected.
In general, if a query requires textual search or geospatial search, a search index must be used.
Search indexes are created for both full text and string searches by
default, but properties can be designated with either option using
asText
or asString
, respectively. Textual
search indexes are by default indexed in both tokenized (TextField) and
non-tokenized (StrField) forms. This means that all textual predicates (token,
tokenPrefix, tokenRegex, eq, neq, regex, prefix) will be usable with all textual
vertex or edge properties indexed. Practically, search indexes should be created
using the asString()
method only in cases where there is absolutely
no use for tokenization and text analysis, such as for inventory categories
(silverware, shoes, clothing). The asText()
method is used if
searching tokenized text, such as long multi-sentence descriptions. The query
optimizer will choose whether to use analyzed or non-analyzed indexing based on the
textual predicate used.
asText()
or undefined (since this is the
default) can use the following options for search:asString()
can use the following options for
search:eq()
search cannot be used with property key indexes created with
asText()
because they contain tokenized data and are therefore not suitable for
exact text matches.asString()
can use the
following Apache TinkerPop options for search:Creating a textual search index
Creating a textual search index.
An example search index from Creating indexes for
vertex label recipe
that will be used for all examples below:
schema.vertexLabel('recipe'). searchIndex(). ifNotExists(). by('instructions').asText(). by('name'). by('cuisine'). waitForIndex(30). create()
This search index uses DSE Search to index instructions
as full text using
tokenization, and name
as both text and string. Only those properties that
specifically should be indexed as non-tokenized data must specify asString()
. If
there are proporties that specifically should be indexed only as tokenized data, specify
asText()
.
Search using token methods on full text
Search using token methods on full text.
token()
Saute
in the instructions. The method token()
is used
with a supplied
word.g.V().has('recipe','instructions', token('Saute')).values('name')results in:
==>Oysters Rockefeller
==>Beef Bourguignon
==>Wild Mushroom Stroganoff

tokenPrefix()
Sea
in the instructions. The method
tokenPrefix()
is used with a supplied prefix (a set of alphanumeric
characters).g.V().hasLabel('recipe').has('instructions', tokenPrefix('Sea')).values('name','instructions')results in:
==>Oysters Rockefeller
==>Saute the shallots, celery, herbs, and seasonings in 3 tablespoons of the butter for 3 minutes. Add the watercress and let it wilt.
==>Roast Pork Loin
==>The day before, separate the meat from the ribs, stopping about 1 inch before the end of the bones. Season the pork liberally inside and out with salt and pepper and refrigerate overnight.
Two recipes are returned, one with the word Season in the instructions, and one with
the word seasonings in the instructions. Case is insensitive in
tokenPrefix()
indexing.
tokenRegex()
.*sea*in.*
, looks for the letters sea preceded by any number of other
characters and followed by any number of other characters until the letters in are found
and also followed by any number of other characters in the instructions and list the recipe
names. The method tokenRegex()
is used with a supplied
regex.g.V().hasLabel('recipe').has('instructions', tokenRegex('.*sea.*in.*')).values('name','instructions')results in:
==>Oysters Rockefeller
==>Saute the shallots, celery, herbs, and seasonings in 3 tablespoons of the butter for 3 minutes. Add the watercress and let it wilt.
Note that in this query, only the Oysters Rockefeller
recipe is returned
because the word Season in the Roast Pork Loin recipe does not meet the requirements for
the regular expression.
Search using non-tokenized methods on strings
Using non-tokenized methods on strings.
eq()
Carrot
Soup
in the recipe name. Note that this search is case-sensitive, so using
carrot soup
would not find a vertex. The method eq()
is used
with a supplied
name.g.V().hasLabel('recipe').has('name', eq('Carrot Soup')).values('name')results in:
==>Carrot Soup
Carrot
in the recipe name. The method eq()
is used with a
supplied
name.g.V().hasLabel('recipe').has('name', eq('Carrot')).values('name')
No match is found, because only a partial name was specified. For asString()
indexes, the string must match.
There is an alternative predicate, within()
, that works similarly to
eq()
, but not exactly the same.
neq()
Saute
in recipe instructions using
neq()
:g.V().has('recipe', 'instructions', neq('Saute')).values('name')results in:
==>Salade Nicoise
==>Spicy Meatloaf
==>Rataouille
==>Carrot Soup
==>Roast Pork Loin
There is an alternative predicate, without()
, that works identically to
neq()
.
phrase()
phrase()
predicate is used with properties designated as TextFields.Find
the exact phrase Wild Mushroom Stroganoff in a recipe
name:g.V().hasLabel('recipe').has('name', phrase('Wild Mushroom Stroganoff',0))The
0
designates that the result must be an exact phrase.The query results
in:==>v[dseg:/recipe/2004]
The vertex id for the correct recipe is returned.
phrase()
predicate can be used for proximity searches, to discover
phrases that have terms that are within a certain distance of one another in the tokenized
text.g.V().hasLabel('recipe').has('name', phrase('Wild Mushroom Stroganoff',1))The value of 1 designates that the result must only have words in the recipe name that are one term away from one another.
==>v[dseg:/recipe/2004]
The
vertex for the correct recipe is returned. A match for g.V().hasLabel('recipe').has('name', phrase('Wild Mushroom',1))
will also return the correct vertex, butg.V().hasLabel('recipe').has('name',
phrase('Mushroom Wild',1))
will not.
prefix()
R
. The method prefix()
is used with a
supplied
string.g.V().hasLabel('recipe').has('name', prefix('R')).values('name')results in:
==>Roast Pork Loin
==>Rataouille
Matches are found for each recipe name that begins with R
, provided the
recipe name was designated with asString()
in the search index.
regex()
regex()
is used with a
supplied
regex.g.V().hasLabel('recipe').has('name', regex('.*ee.*')).values('name')results in:
==>Beef Bourguignon
Matches are found for each recipe name that include the regex .*ee.*
to find
all strings that include ee preceded and followed by any number of other characters,
provided the recipe name was designated with asString()
in the search
index.
Search using fuzzy methods
Search using fuzzy methods.
fuzzy()
fuzzy()
predicate uses optimal string alignment distance calculations to match
properties designated as StrFields. Variations in the letters used in words, such as
misspellings, are the focus of this predicate. The edit distance specified refers to the number
of transpositions of letters, with a single transposition of letters constituting one edit.Find
the exact person
name of James
Beard:g.V().hasLabel('person').has('name', fuzzy('James Beard', 0)).values('name')The
0
designates that the result must be an exact match. This query results
in:==>James BEARD
fuzzy()
predicate will find
misspellings:g.V().hasLabel('person').has('name', fuzzy('James Beard', 1)).values('name')The
1
designates that the result matches with an edit distance of at most one.
This query results
in:==>James BEARD
==>Jmaes BEARD
If
a person
vertex exists with the misspelling Jmaes Beard, the query
shown will find both vertices. The value of 1 finds this misspelling because of the single
transposition of the letters a and m.g.V().hasLabel('person').has('name', fuzzy('Jmase Beard', 2)).values('name')
The
2
designates that the result must match with at most two transpositions. This
query results
in:==>James BEARD
==>Jmaes BEARD
==>Jmase BEARD
If
a person
vertex exists with the misspelling Jmaes Beard, the query
shown will find both vertices. The value of 2 finds both the misspelling because of the single
transposition of letters, e and s in Jmaes Beard, as well as the correct
spelling with a second transposition of letters from Jmase Beard to James Beard
.tokenFuzzy()
tokenFuzzy()
predicate similar to fuzzy()
, but searches
for variation across individual tokens in analyzed textual data (TextFields).Find the recipe
name that includes the word Wild while searching for the word with a one-letter
misspelling:g.V().hasLabel('recipe').has('name', tokenFuzzy('Wlid',1)).values('name')The
1
designates that one letter misspelling (one transposition) is
acceptable.This query results
in:==>Wild Mushroom Stroganoff
Search using Apache TinkerPop methods on strings
Using Apache TinkerPop methods on strings.
Using Apache TinkerPop predicates with DSE Search indexes
Pork
or
pork
will find different results. The supported predicates are:- containing()
- notContaining()
- startingWith()
- notStartingWith()
- endingWith()
- notEndingWith()
containing()
Pork
in the recipe name. A partial match of the recipe name returns a result,
unlike eq()
. Note that this search is case-sensitive, so using
pork
would not find a
vertex.g.V().has('recipe', 'name', containing('Pork')).values('name')results in:
==>Roast Pork Loin
notContaining()
Pork
in the recipe name using notContaining()
. A partial
match of the recipe name will exclude the name from the result, unlike neq()
.
g.V().has('recipe', 'name', notContaining('Pork')).values('name')results in:
==>Salade Nicoise
==>Wild Mushroom Stroganoff
==>Spicy Meatloaf
==>Oysters Rockefeller
==>Rataouille
==>Carrot Soup
==>Beef Bourguignon
startingWith()
Beef
. The method startingWith()
is used with a supplied
string.g.V().has('recipe', 'name', startingWith('Beef')).values('name')results in:
==>Beef Bourguignon
Beef
, provided the
recipe name was designated with asString()
in the search index. A search using
startingWith('Bee')
will find the same result; full words are not
necessary.prefix()
and startingWith()
will return the
same results.notStartingWith()
Beef
. The method notStartingWith()
is used
with a supplied
string.g.V().has('recipe', 'name', notStartingWith('Beef')).values('name')results in:
==>Salade Nicoise
==>Wild Mushroom Stroganoff
==>Spicy Meatloaf
==>Oysters Rockefeller
==>Rataouille
==>Carrot Soup
==>Roast Pork Loin
endingWith()
Soup
. The method endingWith()
is used with a supplied
string. A search using endingWith('oup')
will find the same result; full words
are not necessary.
g.V().has('recipe', 'name', endingWith('Soup')).values('name')results in:
==>Carrot Soup
notEndingWith()
Soup
. The method notEndingWith()
is used with a
supplied
string.g.V().has('recipe', 'name', notEndingWith('Soup')).values('name')results in:
==>Salade Nicoise
==>Wild Mushroom Stroganoff
==>Spicy Meatloaf
==>Oysters Rockefeller
==>Rataouille
==>Beef Bourguignon
==>Roast Pork Loin
Search using tuple and user-defined type (UDT) values
Search using tuple and user-defined type (UDT) values.
Tuple search
schema.vertexLabel('person').searchIndex().ifNotExists().by('country').create()
1960-01-01
, with country.field2
storing the value of
end_date
for a tuple that stores country, start_date,
end_date
.g.V().has('person', 'country.field2', '1960-01-01' as LocalDate)results in:
==>v[dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd]
who happens to be Julia CHILD
. So, to search for a matching value in a tuple,
the tuple name and the field number must be supplied. The fields begin with
field0
.UDT search
schema.vertexLabel('location').searchIndex().ifNotExists().by('loc_details').create()
1960-01-01
, with country.field2
storing the value of
end_date
for a tuple that stores country, start_date,
end_date
.g.V().has('location', 'loc_details.loc_address.address1' , '213 F St')results in:
==>v[dseg:/location/g13]
which happens to be Zippy Mart
. So, to search for a matching value in a UDT,
the tuple name and each of the nested field number must be supplied. In this case, the
property loc_details
is a UDT with its own property
loc_address
which is also a UDT that has a property address1
that is a street address.Search using geospatial values
Search using geospatial values.
Geospatial search
schema.vertexLabel('location'). searchIndex(). ifNotExists(). by('loc_id').asString(). by('geo_point').by('loc_details'). create()
Geo.inside()
. The
in().in()
steps allow the query to traverse from the
location
vertices to the home
vertices and then to the
fridge_sensor
vertices.g.V().hasLabel('location'). has('geo_point', Geo.inside(Geo.point(118,34),20, Geo.Unit.DEGREES)). in().in()
in().in()
portion of this query,
but the error message will display the indexes that must be created.==>v[dseg:/fridge_sensor/31/100/55555/1]
==>v[dseg:/fridge_sensor/31/200/55556/3]
More information on geospatial queries can be found in Geospatial traversals. The main point here is that the geospatial portion of the query can only be met using a search index.
Search using numerical values
Search using numerical values.
Numeric search
cal_goal
:schema.vertexLabel('person'). searchIndex(). ifNotExists(). by('person_id'). by('badge'). by('cal_goal'). by('country'). by('gender'). by('macro_goal'). by('name'). create()This example illustrates that only one search index can exist for each vertex or edge label and includes seven properties that are indexed.
g.V().has('person', 'cal_goal', gt(1200)).values('name', 'cal_goal')results in: Five people have calorie goals of greater than 1200 calories:
==>Sharon SMITH
==>1600
==>Betsy JONES
==>1700
==>John DOE
==>1750
==>John Smith
==>1800
==>Jane DOE
==>1500
order().by('cal_goal',
incr)
to sort in increasing order by calorie goals, and fold()
to create a more readable
result:g.V().has('person', 'cal_goal', gt(1200)). order(). by('cal_goal', incr). values('cal_goal', 'name'). fold()results in:
==>[1500, Jane DOE, 1600, Sharon SMITH, 1700, Betsy JONES, 1750, John DOE, 1800, John SMITH]
Search using two search indexes for a single traversal query
Search using two search indexes for a single traversal query.
Using two search indexes in one traversal
person
.schema.vertexLabel('person').index('search').search(). by('name').asString(). by('nick_name').ifNotExists().add()This search index will use DSE Search to index
nickname
as full text using
tokenization, and name
as a string. V()
that allows a search
index for person
as well as a search index for recipe
to be
used to execute the query. The first index uses a tokenRegex()
to find recipe
instructions that start with the word Braise; this part of the query is labeled as
r for use later in the query. Then the search index for person is searched for an
person name that starts with the letter J, and traversed through an outgoing edge to a
vertex where the search found in the first part of the query is found with
where(eq('r'))
.g.V().has('recipe', 'instructions', tokenRegex('Braise.*')).as('r'). V().has('person', 'name', prefix('J')).out().where(eq('r')).values('name')results in:
==>Beef Bourguignon
==>Beef Bourguignon
==>Beef Bourguignon
==>Beef Bourguignon
Beef Bourguignon
four times, and
illustrates some of the complexity that can be successfully used with search indexes. A
modified query that gets the path from recipe ->person->recipe
finds that
Julia CHILD
created the recipe Beef Bourguignon
, but also
finds the three reviews written about Beef Bourguignon by John DOE
,
John SMITH
, and Jane DOE
:
g.V().has('recipe', 'instructions', tokenRegex('Braise.*')).as('r'). V().has('person', 'name', prefix('J')).out().where(eq('r')).path().unfold().values('name')with results:
==>Beef Bourguignon
==>John DOE
==>Beef Bourguignon
==>Beef Bourguignon
==>Julia CHILD
==>Beef Bourguignon
==>Beef Bourguignon
==>John SMITH
==>Beef Bourguignon
==>Beef Bourguignon
==>Jane DOE
==>Beef Bourguignon
Each
three lines in the results represents the recipe ->person->recipe
path.