# Simple Traversals

Simple traversals can be complex, but not employ specialized techniques such as recursion or branching.

Returning to the Recipe Toy Graph, let's expand the graph to include reviewers and ratings. Load the following script to add the `reviewer` vertices and `recipe-reviewer` edges. You must have inserted the DSE Graph QuickStart data previously, so that the recipe vertices exist before loading this script:
``````// Generates review vertices and edges for Recipe Toy Graph

// reviewer vertices
johnDoe = graph.addVertex(label, 'reviewer', 'name','John Doe')
johnSmith = graph.addVertex(label, 'reviewer', 'name', 'John Smith')
janeDoe = graph.addVertex(label, 'reviewer', 'name','Jane Doe')
sharonSmith = graph.addVertex(label, 'reviewer', 'name','Sharon Smith')
betsyJones = graph.addVertex(label, 'reviewer', 'name','Betsy Jones')

beefBourguignon = g.V().has('recipe', 'name','Beef Bourguignon').tryNext().orElseGet {graph.addVertex(label, 'recipe', 'name', 'Beef Bourguignon')}
spicyMeatLoaf = g.V().has('recipe', 'name','Spicy Meatloaf').tryNext().orElseGet {graph.addVertex(label, 'recipe', 'name', 'Spicy Meatloaf')}
carrotSoup = g.V().has('recipe', 'name','Carrot Soup').tryNext().orElseGet {graph.addVertex(label, 'recipe', 'name', 'Carrot Soup')}

// reviewer - recipe edges
johnDoe.addEdge('rated', beefBourguignon, 'timestamp', '2014-01-01T05:15:00.00Z', 'stars', 5, 'comment', 'Pretty tasty!')
johnSmith.addEdge('rated', beefBourguignon, 'timestamp', '2014-01-23T00:00:00.00Z', 'stars', 4)
janeDoe.addEdge('rated', beefBourguignon, 'timestamp', '2014-02-01T00:00:00.00Z', 'stars', 5, 'comment', 'Yummy!')
sharonSmith.addEdge('rated', beefBourguignon, 'timestamp', '2015-01-01T00:00:00.00Z', 'stars', 3, 'comment', 'It was okay.')
johnDoe.addEdge('rated', spicyMeatLoaf, 'timestamp', '2015-12-31T10:56:00.00Z', 'stars', 4, 'comment', 'Really spicy - be careful!')
sharonSmith.addEdge('rated', spicyMeatLoaf, 'timestamp', '2014-07-23T00:30:00.00Z', 'stars', 3, 'comment', 'Too spicy for me. Use less garlic.')
janeDoe.addEdge('rated', carrotSoup, 'timestamp', '2015-12-30T01:20:00.00Z', 'stars', 5, 'comment', 'Loved this soup! Yummy vegetarian!')
``````
Run the script by first identifying the script, and then remotely executing it.
``gremlin> :load /tmp/generateReviews.groovy``

The recipes that were previously entered are queried to assign the result to recipe variables. The variables are then used to create the reviewer-recipe edges. These queries make use of two Apache TinkerPop methods, `tryNext()` and `orElseGet()`; see the Apache TinkerPop Java API for more information.

## Exploring recipe ratings

Check if the vertices are created by counting the number of vertices with the `reviewer` label.
``````gremlin>  g.V().hasLabel('reviewer').count()
==>5``````
List all the reviewer using `values`:
``````// Get the names of all the reviewers
gremlin> g.V().hasLabel('reviewer').values('name')
==>John Smith
==>Sharon Smith
==>Betsy Jones
==>Jane Doe
==>John Doe``````

Verifying that the reviewers are created is useful, but creating traversals that answer queries is more important. For instance, what does John Doe say about recipes?

Use a query that identifies a vertex label as reviewer with a name value of John Doe.
``g.V().has('reviewer', 'name','John Doe').outE('rated').values('comment')``
The use of the outgoing edges command `outE('rated')` to find all the recipes that John Doe has rated allows the value of the property `comments` to be retrieved:
``````==>Pretty tasty!
==>Really spicy - be careful!``````

It might be nice to know which recipes John Doe reviewed, so another traversal can be used.

``g.V().has('reviewer', 'name','John Doe').outE('rated').inV().values('name')``
resulting in:
``````==>Beef Bourguignon
==>Spicy Meatloaf``````
To find all the reviews that give a recipe more than 3 stars is a reasonable question to ask. Try a traversal using `gt(3)`, or greater than 3 to filter the `stars` values:
``````gremlin> g.E().hasLabel('rated').has('stars', gt(3)).valueMap()
==>[stars:4, timestamp:2014-01-23T00:00:00Z]
==>[comment:Loved this soup! Yummy vegetarian!, stars:5, timestamp:2015-12-30T00:00:00Z]
==>[comment:Yummy!, stars:5, timestamp:2014-02-01T00:00:00Z]
==>[comment:Pretty tasty!, stars:5, timestamp:2014-01-01T00:00:00Z]
==>[comment:Really spicy - be careful!, stars:4, timestamp:2015-12-31T00:00:00Z]``````
The traversal shown finds each edge that is labeled `rated` and filters the edges found to output only those edges with a `star` rating of 4 or 5. But this traversal doesn't output the answer to the original question. The traversal needs modification to get the incoming vertices with `inV()`, and to list those incoming vertices by name with `values('name')`:
``````gremlin> g.E().hasLabel('rated').has('stars', gt(3)).inV().values('name')
==>Beef Bourguignon
==>Spicy Meatloaf
==>Beef Bourguignon
==>Carrot Soup
==>Beef Bourguignon``````
The results indicate that `Beef Bourguignon` has been rated three times, although we don't have any reviewer information, just duplication of the recipe title in the list.
Returning to the previous query, let's look for more recent reviews. Adding an additional traversal step to filter by the `timestamp` can find the 4 and 5 star ratings using `gte(4)` or greater than or equal to 4, with a review date of Jan 1, 2015 or later.
``````
gremlin>  g.E().hasLabel('rated').has('stars',gte(4)).has('timestamp', gte(Instant.parse('2015-01-01T00:00:00.00Z'))).valueMap()
==>[comment:Loved this soup! Yummy vegetarian!, timestamp:2015-12-30T00:00:00Z, stars:5]
==>[comment:Really spicy - be careful!, timestamp:2015-12-31T00:00:00Z, stars:4]``````
Chaining traversal steps together can yield very exacting results. For instance, if we added the `inV().values('name')` to the last query, we'd now refine the results to find all 4-5 star reviews since the beginning of the year 2015.
Manipulating the ratings with statistical functions yields interesting answers. For instance, what is the mean value of all the recipe ratings?
``````gremlin>  g.E().hasLabel('rated').values('stars').mean()
==>4.142857142857143``````
The results show that the reviewers like the recipes they reviewed, and establishes that reviewers in this sample did not write reviews for recipes that they did not like.
Perhaps a prolific reviewer would have a wider range of reviews. Find the maximum number of reviews that a single reviewer has written.
``````gremlin>  g.V().hasLabel('reviewer').map(outE('rated').count()).max()
==>2``````
This traversal maps all the outgoing edges using `outE('rated')` of each reviewer and counts them, then determines which count has the highest value using `max()`.

Another measure that can be investigated is the mean rating of each reviewer. This traversal query uses a number of Apache TinkerPop traversal steps.

The `as()` step allows display labels to be created for the two items that will be lists, the reviewer's name and the mean `stars` value for each reviewer. These display labels, `reviewer` and `starCount` are then used in a `select()` step that gets each value, first the reviewer's name using `by('name')`and then the `starCount` using `by(outE('rated').values('stars').mean()`. The `select()` step checks each `reviewer` vertex and then traverses to discover the associated `starCount` value.
``````gremlin>  g.V().hasLabel('reviewer').as('reviewer','starCount').
select('reviewer','starCount').
by('name').
by(outE('rated').values('stars').mean())
==>[reviewer:Jane Doe, starCount:5.0]
==>[reviewer:Betsy Jones, starCount:NaN]
==>[reviewer:John Doe, starCount:4.5]
==>[reviewer:John Smith, starCount:4.0]
==>[reviewer:Sharon Smith, starCount:3.0]``````
Notice that Betsy Jones is listed as a reviewer, but has not reviewed any recipes. Her `starCount` lists NaN (not a number). It is clear from the results that Jane Doe really likes at least one recipe, while Sharon Smith does not.
Ordering the results by the `starCount`, or mean star rating, can allow the highest rater and the lowest rater to be discovered. Here, the traversal steps `order().by(select('starCount').decr()` use the output of the `select('starCount')` step to order the display in decremental order.
``````gremlin>  g.V().hasLabel('reviewer').as('reviewer','starCount').
select('reviewer','starCount').
by('name').
by(outE('rated').values('stars').mean()).
order().by(select('starCount'), decr)
==>[reviewer:Betsy Jones, starCount:NaN]
==>[reviewer:Jane Doe, starCount:5.0]
==>[reviewer:John Doe, starCount:4.5]
==>[reviewer:John Smith, starCount:4.0]
==>[reviewer:Sharon Smith, starCount:3.0]``````
Betsy Jones and her lack of ratings still cause the listing to be incorrect. We could add a traversal step `limit(1)`to the traversal and get the highest rater, Jane Doe, if Betsy were not listed.
A tricky traversal step, `coalesce()`, is used to change NaN to a zero value.
``````gremlin>  g.V().hasLabel('reviewer').as('reviewer','starCount').
select('reviewer','starCount').
by('name').
by(coalesce(outE('rated').values('stars'),constant(0)).mean()).
order().by(select('starCount'), decr)
==>[reviewer:Jane Doe, starCount:5.0]
==>[reviewer:John Doe, starCount:4.5]
==>[reviewer:John Smith, starCount:4.0]
==>[reviewer:Sharon Smith, starCount:3.0]
==>[reviewer:Betsy Jones, starCount:0.0]``````
Note that now Betsy Jones has a `starCount` of `0.0`, the true value.
Find the star rating each reviewer has given to recipes:
``````g.V().hasLabel('reviewer').as('reviewer','rating').out().as('recipe').
select('reviewer','rating','recipe').
by('name').
by(outE('rated').values('stars')).
by(values('name'))``````
Note how the recipe name is traversed and named with the step modulator `as('recipe')` after the reviewer and rating are labeled from the reviewer vertices with `as('reviewer','rating')`. The first two items in the output listing are retrieved starting at the reviewer vertex while the third item is retrieved from the adjacent recipe vertex.
``````==>{reviewer=John Doe, rating=5, recipe=Beef Bourguignon}
==>{reviewer=John Doe, rating=5, recipe=Spicy Meatloaf}
==>{reviewer=John Smith, rating=4, recipe=Beef Bourguignon}
==>{reviewer=Jane Doe, rating=5, recipe=Beef Bourguignon}
==>{reviewer=Jane Doe, rating=5, recipe=Carrot Soup}
==>{reviewer=Sharon Smith, rating=3, recipe=Beef Bourguignon}
==>{reviewer=Sharon Smith, rating=3, recipe=Spicy Meatloaf}``````
In general, the most interesting statistic from the reviews answers the question about how many people rated a particular recipe, and what the mean rating is for that particular recipe. The graph traversal starts from a recipe vertex this time, and retrieves the recipe name, the number of reviews by counting the incoming edges with `inE('rated').count()`, and the mean value of the incoming edges with `inE('rated').values('stars').mean()`. The `coalesce()` step shown earlier could be used to change all `NaN` values for `meanRating` into zeroes.
``````g.V().hasLabel('recipe').as('recipe','numberOfReviews','meanRating').
select('recipe','numberOfReviews','meanRating').
by('name').
by(inE('rated').count()).
by(inE('rated').values('stars').mean())``````
``````==>{recipe=Beef Bourguignon, numberOfReviews=4, meanRating=4.25}
==>{recipe=Wild Mushroom Stroganoff, numberOfReviews=0, meanRating=NaN}
==>{recipe=Spicy Meatloaf, numberOfReviews=2, meanRating=3.5}
==>{recipe=Rataouille, numberOfReviews=0, meanRating=NaN}
==>{recipe=Roast Pork Loin, numberOfReviews=0, meanRating=NaN}
==>{recipe=Oysters Rockefeller, numberOfReviews=0, meanRating=NaN}
==>{recipe=Carrot Soup, numberOfReviews=1, meanRating=5.0}``````

## Searching recipes

A common query for recipes is finding recipes that contain a certain ingredient.
``g.V().hasLabel('recipe').out().has('name','beef').in().hasLabel('recipe').values('name')``
``==>Beef Bourguignon``
A modification allows a query that includes either one ingredient or another.
``g.V().hasLabel('recipe').out().has('name',within('beef','carrots')).in().hasLabel('recipe').values('name')``
```==>Beef Bourguignon
==>Carrot Soup```
Finding all the ingredients for a particular recipe is a common query.
``````g.V().match(
__.as('a').hasLabel('ingredient'),
__.as('a').in('includes').has('name','Beef Bourguignon')).
select('a').by('name')``````
This query uses a `match()` step to find a match for the ingredients used to make Beef Bourguignon. The traversal starts by filtering all vertices to find the ingredients, then traverses to the recipe vertices along the `includes` edges using `in('includes')`. This query also uses a Groovy double underscore variable as a private variable for the match method. The results are:
``````==>tomato paste
==>beef
==>onion
==>mashed garlic
==>butter``````
Although `inside()` is most commonly used for geospatial searches, the method can be used to find anything that falls within a particular range of values. An example is finding books that have a publishing date between 1960 and 1970:
``g.V().has('book', 'year', inside(1960,1970)).valueMap()``
The results are:
``````==>{ISBN=[0-394-40135-2], year=[1968], name=[The French Chef Cookbook]}
==>{year=[1961], name=[The Art of French Cooking, Vol. 1]}``````

## Grouping output

Group output from a graph traversal using the `group()` traversal step. For example, display all the vertices by name, grouped by label:
``g.V().group().by(label).by('name')``
Note that the meals, ingredients, authors, books, recipes, and reviewers are all grouped in the results:
``````==>[meal:[JuliaDinner, Saturday Feast, EverydayDinner], ingredient:[olive oil, chicken broth,
eggplant, pork sausage, green bell pepper, yellow onion, celery, hard-boiled egg, shallots,
zucchini, butter, green beans, mashed garlic, onion, mushrooms, bacon, parsley, oyster,
tomato, thyme, pork loin, tuna, tomato paste, ground beef, red wine, fennel, Pernod,
chervil, egg noodles, carrots, beef], author:[Louisette Bertholie, Kelsie Kerr,
Alice Waters, Julia Child, Emeril Lagasse, Simone Beck, Patricia Curtan, Patricia Simon,
James Beard, Fritz Streiff], book:[Simca's Cuisine: 100 Classic French Recipes for Every
Occasion, The French Chef Cookbook, The Art of Simple Food: Notes, Lessons, and Recipes
from a Delicious Revolution, The Art of French Cooking, Vol. 1], recipe:[Wild Mushroom
Stroganoff, Roast Pork Loin, Spicy Meatloaf, Rataouille, Beef Bourguignon, Oysters
Rockefeller, Salade Nicoise, Carrot Soup], reviewer:[Sharon Smith, John Smith, Jane Doe,
Betsy Jones, John Doe]]``````
Another example groups all books by year and displays a listing of each year books were published followed by the book titles:
``g.V().hasLabel('book').group().by('year').by('name')``
and lists:
``````==>{1968=[The French Chef Cookbook, The French Chef Cookbook],
1972=[Simca's Cuisine: 100 Classic French Recipes for Every Occasion, Simca's Cuisine: 100
Classic French Recipes for Every Occasion], 2007=[The Art of Simple Food: Notes, Lessons,
and Recipes from a Delicious Revolution, The Art of Simple Food: Notes, Lessons, and Recipes
from a Delicious Revolution], 1961=[The Art of French Cooking, Vol. 1, The Art of French
Cooking, Vol. 1]}``````

## Grouping for processing using local()

Oftentimes, it is critical to do local processing for a particular step in the graph traversal. The next two examples use the `limit()` command to show how `local()` can change the processing from the whole stream entering the query to a portion of the query. First, find just two authors and the year that they have published books:
``````g.V().hasLabel('author').as('author').out().properties('year').as('year').
select('author','year').
by('name').
by().
limit(2)``````
This query results in returning the first two records in the database:
``````==>{author=Julia Child, year=vp[year->1961]}
==>{author=Julia Child, year=vp[year->1968]}``````
Using `local()`, change this query to find the first two books that each author in the graph has published:
``````g.V().hasLabel('author').as('author').
local(out().properties('year').as('year').limit(2)).
select('author','year').
by('name').
by()``````
Note that up to two books are displayed for each author:
``````==>{author=Julia Child, year=vp[year->1961]}
==>{author=Julia Child, year=vp[year->1968]}
==>{author=Simone Beck, year=vp[year->1961]}
==>{author=Simone Beck, year=vp[year->1972]}
==>{author=Louisette Bertholie, year=vp[year->1961]}
==>{author=Patricia Simon, year=vp[year->1972]}
==>{author=Alice Waters, year=vp[year->2007]}
==>{author=Patricia Curtan, year=vp[year->2007]}
==>{author=Kelsie Kerr, year=vp[year->2007]}
==>{author=Fritz Streiff, year=vp[year->2007]}``````
The traversal step `local()` has many applications for processing a subsection of a graph within a graph traversal to return results before moving on to further processing.