Using indexes
Indexes can be used in graph traversal queries to trim down the number of vertices or edges that are initially fetched.
Remember that a search index must be used if two or more properties are needed, as only search indexes can meet multiple conditions.
In general, the traversal step involves a vertex or edge label and can include a property value, including collections, tuples, and user-defined types (UDTs).
In a traversal, the step following g.V()
is generally the step in which an index will be consulted.
If a mid-traversal V()
step is called, then an additional indexed step can be consulted to narrow the list of vertices that will be traversed.
Graph traversals will only use indexes if the both the vertex or edge label and property key are specified. If both are not specified, indexing will not be used and a full graph scan for the property key is the only allowable solution. |
Indexing a vertex
The graph traversal shown uses an index used discover a certain person vertices to start the query:
g.V().has('person', 'name', 'Emeril LAGASSE').out('created').values('name')
results in:
==>Wild Mushroom Stroganoff
==>Spicy Meatloaf
This graph traversal uses a search index for the traversal step has('person', 'name', 'Emeril LAGASSE')
identifies the vertex label and the property indexed.
After finding the initial vertex to traverse from, the outgoing created
edges are walked and the adjacent vertices are listed by name
.
Checking for the use of indexing can be accomplished with the profile()
method:
g.V().has('person', 'name', 'Emeril LAGASSE').out('created').values('name').profile()
with the detailed information:
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
__.V().hasLabel("person").has("name","Emeril LA... 1 1 27.935 78.56
CQL statements ordered by overall duration 24.137
\_1=SELECT * FROM food.person WHERE solr_query = '{"q":"*:*", "fq":["name:Emeril\\ LAGASSE"]}' LIMIT 2147
483647 / Duration: 24 ms / Count: 1
HasStep([~label.eq(person), name.eq(Emeril LAGA... 1 1 0.399 1.12
__.out().hasLabel("created") 2 2 6.161 17.33
CQL statements ordered by overall duration 3.369
\_1=SELECT * FROM food.person__created__recipe WHERE person_person_id = ? / Duration: 2 ms / Count: 1 / I
ndex type: Table: person__created__recipe
\_2=SELECT * FROM food.recipe WHERE recipe_id = ? / Duration: 1 ms / Count: 2 / Index type: Table: recipe
PropertiesStep([name],value) 2 2 0.624 1.76
NoOpBarrierStep(2500) 2 2 0.190 0.53
ReferenceElementStep 2 2 0.248 0.70
>TOTAL - - 35.559 -
The first CQL statement uses a search index WHERE solr_query = '{"q":":", "fq":["name:Emeril\\ LAGASSE"]}'
to first find the person, then uses two CQL statements to find the recipes that are adjacent to that particular person vertex.
Finally, the recipe names are retrieved.
Indexing an edge
An index on an edge property can narrow the query, such as this one that finds all the outgoing edges for reviews that John DOE
wrote that have a rating of greater or equal to 3 stars:
g.V().has('person','name','John DOE').outE().has('stars', gte(3))
results in:
==>e[dseg:/person-reviewed-recipe/46ad98ac-f5c9-4411-815a-f81b3b667921/2005][dseg:/person/46ad98ac-f5c9-4411-815a-f81b3b667921-reviewed->dseg:/recipe/2005]
==>e[dseg:/person-reviewed-recipe/46ad98ac-f5c9-4411-815a-f81b3b667921/2001][dseg:/person/46ad98ac-f5c9-4411-815a-f81b3b667921-reviewed->dseg:/recipe/2001]
Using profile()
on the query shows that a search index query was used in the initial step, and the output shown here shows that in the second step, the personreviewedrecipe_by_person_person_id_stars
materialized view index was used to cut the latency of the query:
g.V().has('person','name','John DOE').outE().has('stars', gte(3)).profile()
with the detailed information:
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
__.V().hasLabel("person").has("name","John DOE") 1 1 13.626 87.00
CQL statements ordered by overall duration 12.376
\_1=SELECT * FROM food.person WHERE solr_query = '{"q":"*:*", "fq":["name:John\\ DOE"]}' LIMIT 2147483647
/ Duration: 12 ms / Count: 1
HasStep([~label.eq(person), name.eq(John DOE)]) 1 1 0.110 0.70
__.outE().has("stars",P.gte((int) 3)) 2 2 1.589 10.15
CQL statements ordered by overall duration 0.573
\_1=SELECT * FROM food.person__reviewed__recipe_by_person_person_id_stars WHERE person_person_id = ? AND
stars >= ? / Duration: < 1 ms / Count: 1 / Index type: Materialized view
HasStep([stars.gte(3)]) 2 2 0.177 1.14
ReferenceElementStep 2 2 0.158 1.01
>TOTAL - - 15.662 -
When indexing seems to be broken
There are cases, such as a query that requires both a search predicate and a map index to work, where the usual querying fails. For instance, the following query will fail:
g.V().
hasLabel('person').has('name', 'John DOE').
has('badge', containsValue('2016-01-01' as LocalDate)).
values('name')
results in:
One or more indexes are required to execute the traversal: g.V().hasLabel("person").has("name","John DOE").has("badge",containsValue(java.time.LocalDate.of(2016, 1, 1))).values("name")
Failed step: __.V().hasLabel("person").has("badge",containsValue(java.time.LocalDate.of(2016, 1, 1))).has("name","John DOE")
CQL execution: No table or view could satisfy the query 'SELECT * FROM food.person WHERE badge CONTAINS ? AND name = ?'
'schema.indexFor(<your_traversal>).analyze()' can't suggest any indexes to create as some steps in your traversal are not supported yet.
Alternatively consider using:
g.with('ignore-unindexed') to ignore unindexed traversal. Your results may be incomplete.
g.with('allow-filtering') to allow filtering. This may have performance implications.
Since search indexes cannot index map collections, this query cannot be completed as presented.
However, using a mid-traversal query can identify whether or not John DOE
does have a badge that meets the requirements:
g.V().has('person','name', 'John DOE').as('a').
V().has('person','badge', containsValue('2016-01-01' as LocalDate)).as('b').
select('a','b').
by('name').by('badge')
results in:
==>{a=John DOE, b={gold=2017-01-01, silver=2016-01-01}}
This is an interesting query that takes advantage of using both a search index and a secondary index to complete the traversal using a mid-traversal V()
step:
g.V().has('person','name', 'John DOE').as('a').
V().has('person','badge', containsValue('2016-01-01' as LocalDate)).as('b').
select('a','b').
by('name').by('badge').
profile()
results in:
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
__.V().hasLabel("person").has("name","John DOE") 1 1 12.464 73.33
CQL statements ordered by overall duration 10.958
\_1=SELECT * FROM food.person WHERE solr_query = '{"q":"*:*", "fq":["name:John\\ DOE"]}' LIMIT 2147483647
/ Duration: 10 ms / Count: 1
HasStep([~label.eq(person), name.eq(John DOE)])... 1 1 0.174 1.02
__.V().hasLabel("person").has("badge",containsV... 1 1 3.402 20.02
CQL statements ordered by overall duration 2.339
\_1=SELECT * FROM food.person WHERE badge CONTAINS ? / Duration: 2 ms / Count: 1 / Index type: Secondary
index
HasStep([~label.eq(person), badge.containsValue... 1 1 0.573 3.38
SelectStep(last,[a, b],[value(name), value(badg... 1 1 0.100 0.59
ReferenceElementStep 1 1 0.282 1.66
>TOTAL - - 16.997
Next steps
This page shows a few examples of using indexes for graph querying, but is not exhaustive. Search indexes, in particular, have a variety of predicates that can be used for text, geospatial, and non-text indexing.