Using indexes
Using indexes for graph queries.
Global indexes can be used in graph traversal queries for the first traversal step
reached after theV()
step, and are used to trim down the number of
vertices that are initially fetched. Remember that a search index must be used if
two or more properties are used for global indexing. In general, the traversal step
involves a vertex label and can include a property key and a particular property
value. In a traversal, the step following g.V()
is generally the
step in which an index will be consulted. If a mid-traversal V()
step is called, then an additional indexed step can be consulted to narrow the list
of vertices that will be traversed.
g.V().has('name','Julia Child')
Could not find an index to answer query clause and graph.allow_scan is disabled:
((label = FridgeSensor & name WITHIN [Julia Child]) | (label = author & name WITHIN [Julia Child]) |
(label = book & name WITHIN [Julia Child]) | (label = ingredient & name WITHIN [Julia Child]) |
(label = meal & name WITHIN [Julia Child]) | (label = recipe & name WITHIN [Julia Child]) |
(label = reviewer & name WITHIN [Julia Child]))
Edge indexes and property indexes (vertex-centric indexes) can be used to narrow the query after a global index has found the starting vertex. They allow definition of the edges that will be followed or the meta-properties that will be used to further restrict the query.
Procedure
-
The graph traversal shown uses an index to discover certain person vertices to
start the query.
g.V().has(person, 'name', 'Emeril Lagasse').out('created').values('name')
This graph traversal uses an index, if the index exists, because the traversal step
has('person', 'name', 'Emeril Lagasse')
identifies the vertex label and the property key indexed. After finding the initial vertex to traverse from, the outgoingcreated
edges are walked and the adjacent vertices are listed byname
. This graph traversal shows the importance of using the vertex label in combination with the property key, as two different elements, persons and recipes, use the same property keyname
.Checking for the use of indexing can be accomplished with theprofile()
method:
Note the index-query used in the first stepgremlin> g.V().has('person', 'name', 'Emeril Lagasse').out('created').values('name').profile() ==>Traversal Metrics Step Count Traversers Time (ms) % Dur ============================================================================================================= DsegGraphStep(vertex,[],(label = person & name ... 1 1 8.427 27.42 query-optimizer 0.792 \_condition=((label = person & name = Emeril Lagasse) & (true)) query-setup 0.028 \_isFitted=true \_isSorted=false \_isScan=false index-query 6.514 \_indexType=Materialized \_usesCache=false \_statement=SELECT "personId" FROM "dse60"."person_p_byName" WHERE "name" = ? LIMIT ?; with params (java. lang.String) Emeril Lagasse, (java.lang.Integer) 50000 \_options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Option al.empty, pagingState=null, pageSize=-1, user=Optional.empty, waitForSchemaAgreement=true, asyn c=true} ...
DsegGraphStep
identifies the index type as materialized. If an index was not used, index-query would be missing from the profile output.
-
An edge index can narrow the query, such as this one that finds all the
outgoing edges for reviews that John Doe wrote that have a rating of
greater or equal to 3 stars:
g.V().has('person','name','John Doe').outE().has('stars', gte(3))
Usingprofile()
on the query shows that a global index query was used in the initial step, and the output shown here shows that in the second step, theratedByStars
edge index was used to cut the latency of the query.Tip: The local() step can be used to affect how an edge index narrows a query.
-
A property index can narrow the query, such as this one that finds the
countries that Julia Child lived in, starting in the year 1961 (in
this case, only one country):
g.V().has('person', 'name','Julia Child').as('person'). local(properties('country').has('startYear', 1961)).value().as('country'). select('person','country'). by('name').by().profile()
Usinggremlin> g.V().has('person', 'name','Julia Child').as('person'). ......1> local(properties('country').has('startYear', 1961).value()).as('country'). ......2> select('person','country'). ......3> by('name').by().profile() ==>Traversal Metrics Step Count Traversers Time (ms) % Dur ============================================================================================================= DsegGraphStep(vertex,[],(label = person & name ... 1 1 1.274 37.35 query-optimizer 0.253 \_condition=((label = person & name = Julia Child) & (true)) query-setup 0.008 \_isFitted=true \_isSorted=false \_isScan=false index-query 0.557 \_indexType=Materialized \_usesCache=false \_statement=SELECT "personId" FROM "newComp"."person_p_byName" WHERE "name" = ? LIMIT ?; with params (jav a.lang.String) Julia Child, (java.lang.Integer) 50000 \_options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Option al.empty, pagingState=null, pageSize=-1, user=Optional.empty, waitForSchemaAgreement=true, asyn c=true} DsegHasStep@[person] 1 1 0.060 1.76 LocalStep([DsegPropertiesStep([country],propert... 1 1 1.300 38.12 DsegPropertiesStep([country],property,(label ... 1 1 1.149 query-optimizer 0.239 \_condition=((label = country & startYear = 1961) & (true)) query-setup 0.001 \_isFitted=true \_isSorted=false \_isScan=false vertex-query 0.564 \_usesCache=false \_statement=SELECT * FROM "newComp"."person_p_OUT_byStartYear_p" WHERE "personId" = ? AND "~~property_ke y_id" = ? AND "~startYear" = ? LIMIT ? ALLOW FILTERING; with params (java.lang.Integer) 1, ( java.lang.Integer) 32801, (java.lang.Integer) 1961, (java.lang.Integer) 50000 \_options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Optio nal.empty, pagingState=null, pageSize=-1, user=Optional.empty, waitForSchemaAgreement=true, as ync=true} \_usesIndex=true DsegHasStep([startYear.eq(1961)]) 1 1 0.081 PropertyValueStep 1 1 0.026 SelectStep(last,[person, country],[value(name),... 1 1 0.720 21.13 NoOpBarrierStep(2500) 1 1 0.032 0.95 DsegPropertyLoadStep 1 1 0.023 0.69 >TOTAL - - 3.411 -
profile()
on the query shows that a global index query was used in the initial step, and the output shown here shows that in the second SELECT step, thebyStartYear
property index was used to cut the latency of the query.Tip: The local() step can also be handy for use with property indexes.