Anatomy of a graph traversal

Structure of a graph traversal

Simple traversals can be complex, but generally do not employ specialized techniques such as recursion or branching.

This graph traversal has four traversal steps, and each builds on the step before it. If you have created the food graph, these steps can be executed to explore the following graph traversal:

g.V().hasLabel('recipe').count()

Break down the chain of a graph traversal into each traversal step sequentially:

  • The graph traversal g

    g will return the graph currently aliased or an error if no graph is aliased:

    ==>food[Core]
  • All vertices are gathered with V()

    All the vertices will be returned. A sample of the results:

    ==>v[dseg:/location/g13]
    ==>v[dseg:/person/4ce9caf1-25b8-468e-a983-69bad20c017a]
    ==>v[dseg:/person/888ad970-0efc-4e2c-b234-b6a71c30efb5]
    ==>v[dseg:/person/4954d71d-f78c-4a6d-9c4a-f40903edbf3c]
    ==>v[dseg:/person/01e22ca6-da10-4cf7-8903-9b7e30c25805]
    ==>v[dseg:/person/6c09f656-5aef-46df-97f9-e7f984c9a3d9]
    ==>v[dseg:/person/daa02698-df4f-4436-8855-941774f4c3e0]
    ==>v[dseg:/location/g5]
    ==>v[dseg:/location/g1]
    ==>v[dseg:/person/d45c76bc-6f93-4d0e-9d9f-33298dae0524]
    ==>v[dseg:/location/g15]
    ==>v[dseg:/location/g100]
    ==>v[dseg:/location/g14]
    ==>v[dseg:/location/g4]
    ==>v[dseg:/location/g11]
    ==>v[dseg:/location/g12]
    ==>v[dseg:/location/g9]
    ==>v[dseg:/person/adb8744c-d015-4d78-918a-d7f062c59e8f]
    ==>v[dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd]
    ==>v[dseg:/person/f092107c-0c5c-47e7-917c-82c7fc2a2493]
    ==>v[dseg:/person/ad58b8bd-033f-48ee-8f3b-a84f9c24e7de]
    ==>v[dseg:/person/6bda1b37-fe96-42bd-a2db-682073d10c37]
    ==>v[dseg:/person/7f969e16-b81e-4fcd-87c5-1911abbed132]
    ==>v[dseg:/person/ef811281-f954-4fd6-ace0-bf67d057771a]
    ==>v[dseg:/person/46ad98ac-f5c9-4411-815a-f81b3b667921]

    The ids of all vertices will be returned. Note that the ids use a URL format.

  • Filter out the vertices labeled as a recipe with hasLabel('recipe')

    Only the vertices that are recipes will be returned:

    ==>v[dseg:/recipe/2003]
    ==>v[dseg:/recipe/2005]
    ==>v[dseg:/recipe/2006]
    ==>v[dseg:/recipe/2007]
    ==>v[dseg:/recipe/2001]
    ==>v[dseg:/recipe/2002]
    ==>v[dseg:/recipe/2004]
    ==>v[dseg:/recipe/2008]
  • Count the number of vertices with count()

    The number of vertices returned from the last traversal step is totalled:

    8

Graph traversal with edges

Any number of traversal steps can be chained into a traversal, filtering and transforming the graph data as required. In some cases, edges will be the result, and perhaps unexpected. Consider the following traversal:

g.V().hasLabel('recipe').has('name', 'Beef Bourguignon').inE().values('comment')

This graph traversal begins as the last traversal did with g.V().hasLabel('recipe'). It is then followed by:

  • A traversal step to pick only the vertices with the recipe title specified

    The filter should capture one recipe if recipe titles are unique.

    ==>v[dseg:/recipe/2001]
  • A traveral step that retrieves incoming edges

    Notice that two different edge labels are filtered with this step. Using inE('reviewed') would be more precise if the target result desired are only reviews.

    i==>e[dseg:/person-reviewed-recipe/46ad98ac-f5c9-4411-815a-f81b3b667921/2001][dseg:/person/46ad98ac-f5c9-4411-815a-f81b3b667921-reviewed->dseg:/recipe/2001]
    ==>e[dseg:/person-created-recipe/e7cd5752-bc0d-4157-a80f-7523add8dbcd/2001][dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd-created->dseg:/recipe/2001]
    ==>e[dseg:/person-reviewed-recipe/4954d71d-f78c-4a6d-9c4a-f40903edbf3c/2001][dseg:/person/4954d71d-f78c-4a6d-9c4a-f40903edbf3c-reviewed->dseg:/recipe/2001]
    ==>e[dseg:/person-reviewed-recipe/6c09f656-5aef-46df-97f9-e7f984c9a3d9/2001][dseg:/person/6c09f656-5aef-46df-97f9-e7f984c9a3d9-reviewed->dseg:/recipe/2001]
    ==>e[dseg:/person-reviewed-recipe/daa02698-df4f-4436-8855-941774f4c3e0/2001][dseg:/person/daa02698-df4f-4436-8855-941774f4c3e0-reviewed->dseg:/recipe/2001]
  • Parsing out the comment property from the rated edges

    The property values are retrieved for the edge property key comment:

    ==>Pretty tasty!
    ==>Yummy!
    ==>It was okay.

Building graph traversals one step at a time can yield interesting results and insight into how to create traversals.

The path of a graph traversal

A traversal step exists that will show the path taken by a graph traversal. First, find the results for a traversal that answers the question about what recipes that list beef and carrots as ingredients are included in the cookbooks, given the cookbook and recipe title?

g.V().hasLabel('ingredient').has('name',within('beef','carrots')).
   in().as('Recipe').
   out().hasLabel('book').as('Book').
   select('Book','Recipe').
      by('name').
      by('name')

Results:

==>{Book=The Art of French Cooking, Vol. 1, Recipe=Beef Bourguignon}
==>{Book=The Art of Simple Food: Notes, Lessons, and Recipes from a Delicious Revolution, Recipe=Carrot Soup}

One expects that the traversal path will be from ingredient to recipe to book. To check if this assumption is correct, add path() to the end of the traversal.

g.V().hasLabel('ingredient').has('name',within('beef','carrots')).
   in().as('Recipe').
   out().hasLabel('book').as('Book').
   select('Book','Recipe').
      by('name').
      by('name').
   path()

Results:

==>path[v[dseg:/ingredient/3028], v[dseg:/recipe/2007], v[dseg:/book/1004], 
{Book=The Art of Simple Food: Notes, Lessons, and Recipes from a Delicious Revolution, Recipe=Carrot Soup}]
==>path[v[dseg:/ingredient/3001], v[dseg:/recipe/2001], v[dseg:/book/1001], 
{Book=The Art of French Cooking, Vol. 1, Recipe=Beef Bourguignon}]

For each case, notice that the traversal does follow the expected path.

Traversal metrics

In addition to tracing the output of each graph traversal step, metrics can produce interesting insights as well. To add metrics to the last traversal shown, add profile():

g.V().hasLabel('ingredient').has('name',within('beef','carrots')).
   in().as('Recipe').
   out().hasLabel('book').as('Book').
   select('Book','Recipe').
      by('name').
      by('name').
   profile()

Results:

==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
__.V().hasLabel("ingredient").has("name",P.with...                     2           2          23.758    67.27
  CQL statements ordered by overall duration                                                  21.103
    \_1=SELECT * FROM food.ingredient WHERE solr_query = '{"q":"*:*", "fq":["name:(beef OR carrots)"]}' LIMIT
         2147483647 / Duration: 21 ms / Count: 1
HasStep([~label.eq(ingredient), name.within([be...                     2           2           0.463     1.31
__.in()                                                                4           4           8.168    23.13
  CQL statements ordered by overall duration                                                  14.625
    \_1=SELECT * FROM food.recipe__includes__ingredient_by_ingredient_ingred_id WHERE ingredient_ingred_id =
        ? / Duration: 3 ms / Count: 2 / Index type: Materialized view
    \_2=SELECT * FROM food.fridge_sensor__contains__ingredient_by_ingredient_ingred_id WHERE ingredient_ingre
        d_id = ? / Duration: 3 ms / Count: 2 / Index type: Materialized view
    \_3=SELECT * FROM food.store__is_stocked_with__ingredient_by_ingredient_ingred_id WHERE ingredient_ingred
        _id = ? / Duration: 2 ms / Count: 2 / Index type: Materialized view
    \_4=SELECT * FROM food.recipe WHERE recipe_id = ? / Duration: 2 ms / Count: 2 / Index type: Table: recipe
    \_5=SELECT * FROM food.store WHERE store_id = ? / Duration: 1 ms / Count: 1 / Index type: Table: store
    \_6=SELECT * FROM food.fridge_sensor WHERE city_id = ? AND state_id = ? AND zipcode_id = ? AND sensor_id
        = ? / Duration: < 1 ms / Count: 1 / Index type: Table: fridge_sensor
__.out().hasLabel("book")                                              2           2           2.369     6.71
  CQL statements ordered by overall duration                                                   1.762
    \_1=SELECT * FROM food.recipe__included_in__book WHERE recipe_recipe_id = ? / Duration: < 1 ms / Count: 2
         / Index type: Table: recipe__included_in__book
    \_2=SELECT * FROM food.book WHERE book_id = ? / Duration: < 1 ms / Count: 2 / Index type: Table: book
HasStep([~label.eq(book)])@[Book]                                      2           2           0.311     0.88
SelectStep(last,[Book, Recipe],[value(name), va...                     2           2           0.141     0.40
ReferenceElementStep                                                   2           2           0.105     0.30
                                            >TOTAL                     -           -          35.318        -

The type of traversal step is listed, along with the number of traversers and the time to complete the traversal step. If a traversal step can be processed in parallel, multiple traversers will be employed to retrieve data. Some traversal steps are graph-global requiring retrieval from the entire graph, such as a graph-global retrieval that finds vertices that match certain conditions. Other traversal steps are graph-local walks and can be processed in parallel, such as the HasStep is a graph-local walk that walks through the graph along constrained paths. DataStax Graph uses automatic query optimization to determine the traversal strategies to efficiently use any index structures that exist.

Looking at the metrics, the question of performance comes to mind. For instance, is there any way to optimize the traversal shown above? In fact, a simple modification results in a time savings:

g.V().hasLabel('ingredient').has('name',within('beef','carrots')).
   in('includes').as('Recipe').
   out().hasLabel('book').as('Book').
   select('Book','Recipe').
      by('name').
      by('name').
   profile()

Results:

==>Traversal Metrics
Step                                                               Count  Traversers       Time (ms)    % Dur
=============================================================================================================
__.V().hasLabel("ingredient").has("name",P.with...                     2           2          10.530    62.59
  CQL statements ordered by overall duration                                                   9.169
    \_1=SELECT * FROM food.ingredient WHERE solr_query = '{"q":"*:*", "fq":["name:(beef OR carrots)"]}' LIMIT
         2147483647 / Duration: 9 ms / Count: 1
HasStep([~label.eq(ingredient), name.within([be...                     2           2           0.167     0.99
__.in().hasLabel("includes")                                           2           2           3.369    20.02
  CQL statements ordered by overall duration                                                   2.172
    \_1=SELECT * FROM food.recipe__includes__ingredient_by_ingredient_ingred_id WHERE ingredient_ingred_id =
        ? / Duration: 1 ms / Count: 2 / Index type: Materialized view
    \_2=SELECT * FROM food.recipe WHERE recipe_id = ? / Duration: < 1 ms / Count: 2 / Index type: Table: reci
        pe
__.out().hasLabel("book")                                              2           2           2.289    13.61
  CQL statements ordered by overall duration                                                   1.969
    \_1=SELECT * FROM food.recipe__included_in__book WHERE recipe_recipe_id = ? / Duration: < 1 ms / Count: 2
         / Index type: Table: recipe__included_in__book
    \_2=SELECT * FROM food.book WHERE book_id = ? / Duration: < 1 ms / Count: 2 / Index type: Table: book
HasStep([~label.eq(book)])@[Book]                                      2           2           0.333     1.98
SelectStep(last,[Book, Recipe],[value(name), va...                     2           2           0.068     0.41
ReferenceElementStep                                                   2           2           0.067     0.40
                                            >TOTAL                     -           -          16.826        -

The change made is subtle. The traversal steps, in() has been replaced by inE('includes'), to find only the edges that are connected to recipes. Although each measurement can vary, generally the second traversal will outperform the first traversal.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com