Using indexes

Indexes can be used in graph traversal queries to trim down the number of vertices or edges that are initially fetched. Remember that a search index must be used if two or more properties are needed, as only search indexes can meet multiple conditions. In general, the traversal step involves a vertex or edge label and can include a property value, including collections, tuples, and user-defined types (UDTs). In a traversal, the step following g.V() is generally the step in which an index will be consulted. If a mid-traversal V() step is called, then an additional indexed step can be consulted to narrow the list of vertices that will be traversed.

Graph traversals will only use indexes if the both the vertex or edge label and property key are specified. If both are not specified, indexing will not be used and a full graph scan for the property key is the only allowable solution.

Indexing a vertex

The graph traversal shown uses an index used discover a certain person vertices to start the query:

g.V().has('person', 'name', 'Emeril LAGASSE').out('created').values('name')

results in:

==>Wild Mushroom Stroganoff
==>Spicy Meatloaf

This graph traversal uses a search index for the traversal step has('person', 'name', 'Emeril LAGASSE') identifies the vertex label and the property indexed. After finding the initial vertex to traverse from, the outgoing created edges are walked and the adjacent vertices are listed by name.

Checking for the use of indexing can be accomplished with the profile() method:

g.V().has('person', 'name', 'Emeril LAGASSE').out('created').values('name').profile()

with the detailed information:

==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
__.V().hasLabel("person").has("name","Emeril LA...                     1           1          27.935    78.56
  CQL statements ordered by overall duration                                                  24.137
    \_1=SELECT * FROM food.person WHERE solr_query = '{"q":"*:*", "fq":["name:Emeril\\ LAGASSE"]}' LIMIT 2147
        483647 / Duration: 24 ms / Count: 1
HasStep([~label.eq(person), name.eq(Emeril LAGA...                     1           1           0.399     1.12
__.out().hasLabel("created")                                           2           2           6.161    17.33
  CQL statements ordered by overall duration                                                   3.369
    \_1=SELECT * FROM food.person__created__recipe WHERE person_person_id = ? / Duration: 2 ms / Count: 1 / I
        ndex type: Table: person__created__recipe
    \_2=SELECT * FROM food.recipe WHERE recipe_id = ? / Duration: 1 ms / Count: 2 / Index type: Table: recipe
PropertiesStep([name],value)                                           2           2           0.624     1.76
NoOpBarrierStep(2500)                                                  2           2           0.190     0.53
ReferenceElementStep                                                   2           2           0.248     0.70
                                            >TOTAL                     -           -          35.559        -

The first CQL statement uses a search index WHERE solr_query = '{"q":":", "fq":["name:Emeril\\ LAGASSE"]}' to first find the person, then uses two CQL statements to find the recipes that are adjacent to that particular person vertex. Finally, the recipe names are retrieved.

Indexing an edge

An index on an edge property can narrow the query, such as this one that finds all the outgoing edges for reviews that John DOE wrote that have a rating of greater or equal to 3 stars:

g.V().has('person','name','John DOE').outE().has('stars', gte(3))

results in:

==>e[dseg:/person-reviewed-recipe/46ad98ac-f5c9-4411-815a-f81b3b667921/2005][dseg:/person/46ad98ac-f5c9-4411-815a-f81b3b667921-reviewed->dseg:/recipe/2005]
==>e[dseg:/person-reviewed-recipe/46ad98ac-f5c9-4411-815a-f81b3b667921/2001][dseg:/person/46ad98ac-f5c9-4411-815a-f81b3b667921-reviewed->dseg:/recipe/2001]

Using profile() on the query shows that a search index query was used in the initial step, and the output shown here shows that in the second step, the personreviewedrecipe_by_person_person_id_stars materialized view index was used to cut the latency of the query:

g.V().has('person','name','John DOE').outE().has('stars', gte(3)).profile()

with the detailed information:

==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
__.V().hasLabel("person").has("name","John DOE")                       1           1          13.626    87.00
  CQL statements ordered by overall duration                                                  12.376
    \_1=SELECT * FROM food.person WHERE solr_query = '{"q":"*:*", "fq":["name:John\\ DOE"]}' LIMIT 2147483647
         / Duration: 12 ms / Count: 1
HasStep([~label.eq(person), name.eq(John DOE)])                        1           1           0.110     0.70
__.outE().has("stars",P.gte((int) 3))                                  2           2           1.589    10.15
  CQL statements ordered by overall duration                                                   0.573
    \_1=SELECT * FROM food.person__reviewed__recipe_by_person_person_id_stars WHERE person_person_id = ? AND
        stars >= ? / Duration: < 1 ms / Count: 1 / Index type: Materialized view
HasStep([stars.gte(3)])                                                2           2           0.177     1.14
ReferenceElementStep                                                   2           2           0.158     1.01
                                            >TOTAL                     -           -          15.662        -

When indexing seems to be broken

There are cases, such as a query that requires both a search predicate and a map index to work, where the usual querying fails. For instance, the following query will fail:

g.V().
  hasLabel('person').has('name', 'John DOE').
  has('badge', containsValue('2016-01-01' as LocalDate)).
  values('name')

results in:

One or more indexes are required to execute the traversal: g.V().hasLabel("person").has("name","John DOE").has("badge",containsValue(java.time.LocalDate.of(2016, 1, 1))).values("name")
Failed step: __.V().hasLabel("person").has("badge",containsValue(java.time.LocalDate.of(2016, 1, 1))).has("name","John DOE")
CQL execution: No table or view could satisfy the query 'SELECT * FROM food.person WHERE badge CONTAINS ? AND name = ?'
'schema.indexFor(<your_traversal>).analyze()' can't suggest any indexes to create as some steps in your traversal are not supported yet.

Alternatively consider using:
g.with('ignore-unindexed') to ignore unindexed traversal. Your results may be incomplete.
g.with('allow-filtering') to allow filtering. This may have performance implications.

Since search indexes cannot index map collections, this query cannot be completed as presented. However, using a mid-traversal query can identify whether or not John DOE does have a badge that meets the requirements:

g.V().has('person','name', 'John DOE').as('a').
  V().has('person','badge', containsValue('2016-01-01' as LocalDate)).as('b').
  select('a','b').
    by('name').by('badge')

results in:

==>{a=John DOE, b={gold=2017-01-01, silver=2016-01-01}}

This is an interesting query that takes advantage of using both a search index and a secondary index to complete the traversal using a mid-traversal V()step:

g.V().has('person','name', 'John DOE').as('a').
  V().has('person','badge', containsValue('2016-01-01' as LocalDate)).as('b').
  select('a','b').
    by('name').by('badge').
  profile()

results in:

==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
__.V().hasLabel("person").has("name","John DOE")                       1           1          12.464    73.33
  CQL statements ordered by overall duration                                                  10.958
    \_1=SELECT * FROM food.person WHERE solr_query = '{"q":"*:*", "fq":["name:John\\ DOE"]}' LIMIT 2147483647
         / Duration: 10 ms / Count: 1
HasStep([~label.eq(person), name.eq(John DOE)])...                     1           1           0.174     1.02
__.V().hasLabel("person").has("badge",containsV...                     1           1           3.402    20.02
  CQL statements ordered by overall duration                                                   2.339
    \_1=SELECT * FROM food.person WHERE badge CONTAINS ? / Duration: 2 ms / Count: 1 / Index type: Secondary
        index
HasStep([~label.eq(person), badge.containsValue...                     1           1           0.573     3.38
SelectStep(last,[a, b],[value(name), value(badg...                     1           1           0.100     0.59
ReferenceElementStep                                                   1           1           0.282     1.66
                                            >TOTAL                     -           -          16.997

Next steps

This page shows a few examples of using indexes for graph querying, but is not exhaustive. Search indexes, in particular, have a variety of predicates that can be used for text, geospatial, and non-text indexing.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com