QuickStart Exploring traversals

Explore graph data with query traversals.

About this task

Exploring the graph with graph traversals can lead to interesting conclusions. Here we’ll explore a number of traversals, to show off the power of Gremlin in creating simple queries.

As with all queries in Graph, if you are using Gremlin console, alias the graph traversal g to a graph with :remote config alias g food_qs.g before running any commands.

Procedure

  1. All queries can be profiled to see what the query path is and how the query performs. The profile() step will display information abut the length of time each portion of the command takes to run, as well as the underlying CQL command that is run to complete the Gremlin command.

    g.V().has('person', 'name', 'Julia CHILD').profile()

    In Studio:

    GSStudioProfile

    Clicking on the bars in the graph in Studio will show more detail about underlying CQL commands that Graph uses to execute a query.

    In Gremlin console:

    ==>Traversal Metrics
    Step                                                               Count  Traversers       Time (ms)    % Dur
    =============================================================================================================
    __.V().has("name","Julia CHILD")                                                              52.140    96.02
    HasStep([name.eq(Julia CHILD)])                                                                1.987     3.66
    ReferenceElementStep                                                                           0.174     0.32
                                                >TOTAL                     -           -          54.302        -

    In all the following queries, to investigate what happens, and why some queries are more efficient than others, try adding .profile() to any query will show you information similar to the information above.

  2. Graph queries will have lower latency if the query is more specific, and uses the has() step is for narrowing the search. Compare the following queries and their profiles (by adding .profile() to the end:

    dev.V().hasLabel('person')
    dev.V().hasLabel('person').has('name', 'Julia CHILD')
    dev.V().has('person','name', 'Julia CHILD')

    Running any of these queries in Studio will display the vertex id, label and all property values. In Gremlin console, these queries will only display the vertex id; the elementMap() step must be appended to get the property values.

  3. In this next traversal, has() filters vertex properties by name = Julia Child as seen above. The traversal step outE() discovers the outgoing edges from that vertex with the authored label.

    g.V().has('person','name','Julia CHILD').outE('authored')

    In Studio, either the listing of the Raw JSON view edge information:

    GSStudioAuthorOutE1

    In Gremlin console:

    ==>e[dseg:/person-authored-book/e7cd5752-bc0d-4157-a80f-7523add8dbcd/1001][dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd-authored->dseg:/book/1001]
    ==>e[dseg:/person-authored-book/e7cd5752-bc0d-4157-a80f-7523add8dbcd/1003][dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd-authored->dseg:/book/1003]
  4. If instead, you want to query for the books that all people have written, the query must be modified. The previous example retrieved edges, but not the adjacent book vertices. Add a traversal step inV() to find all the vertices that connect to the outgoing edges, then print the book titles of those vertices. Notice how the chained traversal steps go from the vertices along outgoing edges to the adjacent vertices with V().outE().inV(). The outgoing edges are given a particular filter value, authored.

    g.V().outE('authored').inV().values('name')

    In Studio:GSStudioAllBooks and in Gremlin console:

    ==>The Art of Simple Food: Notes, Lessons, and Recipes from a Delicious Revolution
    ==>The Art of Simple Food: Notes, Lessons, and Recipes from a Delicious Revolution
    ==>The Art of Simple Food: Notes, Lessons, and Recipes from a Delicious Revolution
    ==>The Art of Simple Food: Notes, Lessons, and Recipes from a Delicious Revolution
    ==>The Art of French Cooking, Vol. 1
    ==>The Art of French Cooking, Vol. 1
    ==>Simca's Cuisine: 100 Classic French Recipes for Every Occasion
    ==>Simca's Cuisine: 100 Classic French Recipes for Every Occasion
    ==>The French Chef Cookbook
  5. Notice that the book titles are duplicated in the resulting list, because a listing is returned for each author. If a book has three authors, three listings are returned. The traversal step dedup() can eliminate the duplication.

    g.V().outE('authored').inV().values('name').dedup()

    In Studio:GSStudioDedup and in Gremlin console:

    ==>The Art of Simple Food: Notes, Lessons, and Recipes from a Delicious Revolution
    ==>The Art of French Cooking, Vol. 1
    ==>Simca's Cuisine: 100 Classic French Recipes for Every Occasion
    ==>The French Chef Cookbook
  6. Refine the traversal by reinserting the has() step for a particular author. Find all the books authored by Julia Child.

    g.V().has('name','Julia CHILD').outE('authored').inV().values('name')

    In Studio:GSStudioJuliaBooks and in Gremlin console:

    ==>The Art of French Cooking, Vol. 1
    ==>The French Chef Cookbook
  7. The previous example and this example accomplish the same result. However, the number of traversal steps and the type of traversal steps can affect performance. The traversal step outE() should be only used if the edges are explicitly required. In this example, the edges are traversed to get information about connected vertices, but the edge information is not important to the query.

    g.V().has('person', 'name','Julia CHILD').out('authored').values('name')

    In Studio:GSStudioJuliaBooks2 and in Gremlin console:

    ==>The Art of French Cooking, Vol. 1
    ==>The French Chef Cookbook

    The traversal step out() retrieves the connected book vertices based on the edge label authored without retrieving the edge information. In a larger graph traversal, this subtle difference in the traversal can become a latency issue.

  8. Additional traversal steps continue to fine-tune the results. Adding another chained has traversal step finds only books authored by Julia Child published after 1967. This example also displays the use of the gt, or greater than function.

    g.V().has('person', 'name','Julia CHILD').out('authored').has('publish_year', gt(1967)).values('name', 'publish_year')

    In Studio:GSStudioGreaterThan and in Gremlin console:

    ==>The French Chef Cookbook
    ==>1968
  9. When developing or testing, oftentimes checking the number of vertices with each vertex label can confirm that data was read. To find the number of vertices by vertex label, use the traversal step label() followed by the traversal step groupCount(). The step groupCount() is useful for aggregating results from a previous step. Suppress the warning by prepending g.with("label-warning", false). instead of g.:

    g.V().label().groupCount()

    In Studio:

    GSStudioGroupCount

    and in Gremlin console:

    ==>{meal=8, meal_item=3, ingredient=31, person=15, book=4, recipe=8, fridge_sensor=9, location=16, store=3, home=3}

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com