Using indexes

Global indexes can be used in graph traversal queries for the first traversal step reached after the V() step, and are used to trim down the number of vertices that are initially fetched. Remember that a search index must be used if two or more properties are used for global indexing. In general, the traversal step involves a vertex label and can include a property key and a particular property value. In a traversal, the step following g.V() is generally the step in which an index will be consulted. If a mid-traversal V() step is called, then an additional indexed step can be consulted to narrow the list of vertices that will be traversed.

Graph traversals will only use indexes if the both the vertex label and property key are specified. If both are not specified, indexing will not be used and a full graph scan for the property key can result. If full graph scan is disabled, a query will fail, as shown in this example where a property is specified, but a vertex label is not specified:

g.V().has('name','Julia Child')
Could not find an index to answer query clause and graph.allow_scan is disabled:
((label = FridgeSensor & name WITHIN [Julia Child]) | (label = author & name WITHIN [Julia Child]) |
(label = book & name WITHIN [Julia Child]) | (label = ingredient & name WITHIN [Julia Child]) |
(label = meal & name WITHIN [Julia Child]) | (label = recipe & name WITHIN [Julia Child]) |
(label = reviewer & name WITHIN [Julia Child]))

Edge indexes and property indexes (vertex-centric indexes) can be used to narrow the query after a global index has found the starting vertex. They allow definition of the edges that will be followed or the meta-properties that will be used to further restrict the query.

  • Global index

  • The graph traversal shown uses an index to discover certain person vertices to start the query.

    g.V().has('author', 'name', 'Emeril Lagasse').out('created').values('name')
    usingIndex1
    Using index part one

    This graph traversal uses an index, if the index exists, because the traversal step has('author', 'name', 'Emeril Lagasse') identifies the vertex label and the property key indexed. After finding the initial vertex to traverse from, the outgoing created edges are walked and the adjacent vertices are listed by name. This graph traversal shows the importance of using the vertex label in combination with the property key, as two different elements, authors and recipes, use the same property key name.

    Checking for the use of indexing can be accomplished with the profile() method:

    gremlin> g.V().has('author', 'name', 'Emeril Lagasse').out('created').values('name').profile()
    ==>Traversal Metrics
    Step                                                               Count  Traversers       Time (ms)    % Dur
    =============================================================================================================
    DsegGraphStep([~label.=(author), name.=(Emeril ...                     1           1           2.196    51.37
      query-optimizer                                                                              0.199
      query-setup                                                                                  0.004
      index-query                                                                                  0.946
    DsegVertexStep(OUT,[created],vertex)                                   2           2           0.935    21.88
      query-optimizer                                                                              0.101
      query-setup                                                                                  0.000
      vertex-query                                                                                 0.282
    DsegPropertiesStep([name],value)                                       2           2           1.030    24.11
      query-optimizer                                                                              0.044
      query-setup                                                                                  0.005
      vertex-query                                                                                 0.347
      vertex-query                                                                                 0.639
      query-setup                                                                                  0.000
    NoOpBarrierStep(2500)                                                  2           2           0.113     2.64
                                                >TOTAL                     -           -           4.276        -

    The index-query used in the first step ll DsegGraphStep identifies the index type as materialized. If an index was not used, index-query would be missing from the profile output.

  • Edge index

  • An edge index can narrow the query, such as this one that finds all the outgoing edges for reviews that John Doe wrote that have a rating of greater or equal to 3 stars:

    g.V().has('person','name','John Doe').outE().has('stars', gte(3))
    usingIndex2
    Use index part two

    Using profile() on the query shows that a global index query was used in the initial step, and the output shown here shows that in the second step, the ratedByStars edge index was used to cut the latency of the query.

    The local() step can be used to affect how an edge index narrows a query.

  • Property index

  • A property index can narrow the query, such as this one that finds the countries that Julia Child lived in, starting in the year 1961 (in this case, only one country):

    g.V().has('author', 'name','Julia Child').as('author').
       local(properties('country').has('startYear', 1961)).value().as('country').
       select('author','country').
          by('name').by().profile()
    gremlin> g.V().has('author', 'name','Julia Child').as('author').
    ......1>    local(properties('country').has('startYear', 1961).value()).as('country').
    ......2>    select('author','country').
    ......3>       by('name').by().profile()
    ==>Traversal Metrics
    Step                                                               Count  Traversers       Time (ms)    % Dur
    =============================================================================================================
    DsegGraphStep(vertex,[],(label = author & name ...                     1           1           1.274    37.35
      query-optimizer                                                                              0.253
        _condition=((label = author & name = Julia Child) & (true))
      query-setup                                                                                  0.008
        _isFitted=true
        _isSorted=false
        _isScan=false
      index-query                                                                                  0.557
        _indexType=Materialized
        _usesCache=false
        _statement=SELECT "authorId" FROM "newComp"."author_p_byName" WHERE "name" = ? LIMIT ?; with params (jav
                    a.lang.String) Julia Child, (java.lang.Integer) 50000
        _options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Option
                  al.empty, pagingState=null, pageSize=-1, user=Optional.empty, waitForSchemaAgreement=true, asyn
                  c=true}
    DsegHasStep@[person]                                                   1           1           0.060     1.76
    LocalStep([DsegPropertiesStep([country],propert...                     1           1           1.300    38.12
      DsegPropertiesStep([country],property,(label ...                     1           1           1.149
        query-optimizer                                                                            0.239
        _condition=((label = country & startYear = 1961) & (true))
        query-setup                                                                                0.001
        _isFitted=true
        _isSorted=false
        _isScan=false
        vertex-query                                                                               0.564
        _usesCache=false
        _statement=SELECT * FROM "newComp"."author_p_OUT_byStartYear_p" WHERE "authorId" = ? AND "~~property_ke
                     y_id" = ? AND "~startYear" = ? LIMIT ? ALLOW FILTERING; with params (java.lang.Integer) 1, (
                     java.lang.Integer) 32801, (java.lang.Integer) 1961, (java.lang.Integer) 50000
        _options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Optio
                   nal.empty, pagingState=null, pageSize=-1, user=Optional.empty, waitForSchemaAgreement=true, as
                   ync=true}
        _usesIndex=true
      DsegHasStep([startYear.eq(1961)])                                    1           1           0.081
      PropertyValueStep                                                    1           1           0.026
    SelectStep(last,[author, country],[value(name),...                     1           1           0.720    21.13
    NoOpBarrierStep(2500)                                                  1           1           0.032     0.95
    DsegPropertyLoadStep                                                   1           1           0.023     0.69
                                                >TOTAL                     -           -           3.411        -

    Using profile() on the query shows that a global index query was used in the initial step, and the output shown here shows that in the second SELECT step, the byStartYear property index was used to cut the latency of the query.

    The local() step can also be handy for use with property indexes.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com