Using indexes

Using indexes for graph queries.

Global indexes can be used in graph traversal queries for the first traversal step reached after theV() step, and are used to trim down the number of vertices that are initially fetched. Remember that a search index must be used if two or more properties are used for global indexing. In general, the traversal step involves a vertex label and can include a property key and a particular property value. In a traversal, the step following g.V() is generally the step in which an index will be consulted. If a mid-traversal V() step is called, then an additional indexed step can be consulted to narrow the list of vertices that will be traversed.

Note: Graph traversals will only use indexes if the both the vertex label and property key are specified. If both are not specified, indexing will not be used and a full graph scan for the property key can result. If full graph scan is disabled, a query will fail, as shown in this example where a property is specified, but a vertex label is not specified:
g.V().has('name','Julia Child')
Could not find an index to answer query clause and graph.allow_scan is disabled: 
((label = FridgeSensor & name WITHIN [Julia Child]) | (label = author & name WITHIN [Julia Child]) | 
(label = book & name WITHIN [Julia Child]) | (label = ingredient & name WITHIN [Julia Child]) | 
(label = meal & name WITHIN [Julia Child]) | (label = recipe & name WITHIN [Julia Child]) | 
(label = reviewer & name WITHIN [Julia Child]))

Edge indexes and property indexes (vertex-centric indexes) can be used to narrow the query after a global index has found the starting vertex. They allow definition of the edges that will be followed or the meta-properties that will be used to further restrict the query.

Procedure

Global index
  • The graph traversal shown uses an index to discover certain person vertices to start the query.
    g.V().has(person, 'name', 'Emeril Lagasse').out('created').values('name')

    This graph traversal uses an index, if the index exists, because the traversal step has('person', 'name', 'Emeril Lagasse') identifies the vertex label and the property key indexed. After finding the initial vertex to traverse from, the outgoing created edges are walked and the adjacent vertices are listed by name. This graph traversal shows the importance of using the vertex label in combination with the property key, as two different elements, persons and recipes, use the same property key name.

    Checking for the use of indexing can be accomplished with the profile() method:
    gremlin> g.V().has('person', 'name', 'Emeril Lagasse').out('created').values('name').profile()
    ==>Traversal Metrics
    Step                                                               Count  Traversers       Time (ms)    % Dur
    =============================================================================================================
    DsegGraphStep(vertex,[],(label = person & name ...                     1           1           8.427    27.42
      query-optimizer                                                                              0.792
        \_condition=((label = person & name = Emeril Lagasse) & (true))
      query-setup                                                                                  0.028
        \_isFitted=true
        \_isSorted=false
        \_isScan=false
      index-query                                                                                  6.514
        \_indexType=Materialized
        \_usesCache=false
        \_statement=SELECT "personId" FROM "dse60"."person_p_byName" WHERE "name" = ? LIMIT ?; with params (java.
                    lang.String) Emeril Lagasse, (java.lang.Integer) 50000
        \_options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Option
                  al.empty, pagingState=null, pageSize=-1, user=Optional.empty, waitForSchemaAgreement=true, asyn
                  c=true}
    ...
    Note the index-query used in the first step DsegGraphStep identifies the index type as materialized. If an index was not used, index-query would be missing from the profile output.
Edge index
  • An edge index can narrow the query, such as this one that finds all the outgoing edges for reviews that John Doe wrote that have a rating of greater or equal to 3 stars:
    g.V().has('person','name','John Doe').outE().has('stars', gte(3))
    Using profile() on the query shows that a global index query was used in the initial step, and the output shown here shows that in the second step, the ratedByStars edge index was used to cut the latency of the query.
    Tip: The local() step can be used to affect how an edge index narrows a query.
Property index
  • A property index can narrow the query, such as this one that finds the countries that Julia Child lived in, starting in the year 1961 (in this case, only one country):
    g.V().has('person', 'name','Julia Child').as('person').
       local(properties('country').has('startYear', 1961)).value().as('country').
       select('person','country').
          by('name').by().profile()
    gremlin> g.V().has('person', 'name','Julia Child').as('person').
    ......1>    local(properties('country').has('startYear', 1961).value()).as('country').
    ......2>    select('person','country').
    ......3>       by('name').by().profile()
    ==>Traversal Metrics
    Step                                                               Count  Traversers       Time (ms)    % Dur
    =============================================================================================================
    DsegGraphStep(vertex,[],(label = person & name ...                     1           1           1.274    37.35
      query-optimizer                                                                              0.253
        \_condition=((label = person & name = Julia Child) & (true))
      query-setup                                                                                  0.008
        \_isFitted=true
        \_isSorted=false
        \_isScan=false
      index-query                                                                                  0.557
        \_indexType=Materialized
        \_usesCache=false
        \_statement=SELECT "personId" FROM "newComp"."person_p_byName" WHERE "name" = ? LIMIT ?; with params (jav
                    a.lang.String) Julia Child, (java.lang.Integer) 50000
        \_options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Option
                  al.empty, pagingState=null, pageSize=-1, user=Optional.empty, waitForSchemaAgreement=true, asyn
                  c=true}
    DsegHasStep@[person]                                                   1           1           0.060     1.76
    LocalStep([DsegPropertiesStep([country],propert...                     1           1           1.300    38.12
      DsegPropertiesStep([country],property,(label ...                     1           1           1.149
        query-optimizer                                                                            0.239
        \_condition=((label = country & startYear = 1961) & (true))
        query-setup                                                                                0.001
        \_isFitted=true
        \_isSorted=false
        \_isScan=false
        vertex-query                                                                               0.564
        \_usesCache=false
        \_statement=SELECT * FROM "newComp"."person_p_OUT_byStartYear_p" WHERE "personId" = ? AND "~~property_ke
                     y_id" = ? AND "~startYear" = ? LIMIT ? ALLOW FILTERING; with params (java.lang.Integer) 1, (
                     java.lang.Integer) 32801, (java.lang.Integer) 1961, (java.lang.Integer) 50000
        \_options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Optio
                   nal.empty, pagingState=null, pageSize=-1, user=Optional.empty, waitForSchemaAgreement=true, as
                   ync=true}
        \_usesIndex=true
      DsegHasStep([startYear.eq(1961)])                                    1           1           0.081
      PropertyValueStep                                                    1           1           0.026
    SelectStep(last,[person, country],[value(name),...                     1           1           0.720    21.13
    NoOpBarrierStep(2500)                                                  1           1           0.032     0.95
    DsegPropertyLoadStep                                                   1           1           0.023     0.69
                                                >TOTAL                     -           -           3.411        -
    Using profile() on the query shows that a global index query was used in the initial step, and the output shown here shows that in the second SELECT step, the byStartYear property index was used to cut the latency of the query.
    Tip: The local() step can also be handy for use with property indexes.