Simple Traversals
Simple traversals can be complex, but not employ specialized techniques such as recursion or branching.
reviewer
vertices and
recipe-reviewer
edges. You must have inserted the DSE Graph QuickStart data previously, so that the recipe
vertices exist before loading this
script:// Generates review vertices and edges for Recipe Toy Graph
// :load /tmp/generateReviews.groovy
// reviewer vertices
johnDoe = graph.addVertex(label, 'reviewer', 'name','John Doe')
johnSmith = graph.addVertex(label, 'reviewer', 'name', 'John Smith')
janeDoe = graph.addVertex(label, 'reviewer', 'name','Jane Doe')
sharonSmith = graph.addVertex(label, 'reviewer', 'name','Sharon Smith')
betsyJones = graph.addVertex(label, 'reviewer', 'name','Betsy Jones')
beefBourguignon = g.V().has('recipe', 'name','Beef Bourguignon').tryNext().orElseGet {graph.addVertex(label, 'recipe', 'name', 'Beef Bourguignon')}
spicyMeatLoaf = g.V().has('recipe', 'name','Spicy Meatloaf').tryNext().orElseGet {graph.addVertex(label, 'recipe', 'name', 'Spicy Meatloaf')}
carrotSoup = g.V().has('recipe', 'name','Carrot Soup').tryNext().orElseGet {graph.addVertex(label, 'recipe', 'name', 'Carrot Soup')}
// reviewer - recipe edges
johnDoe.addEdge('rated', beefBourguignon, 'timestamp', '2014-01-01T05:15:00.00Z', 'stars', 5, 'comment', 'Pretty tasty!')
johnSmith.addEdge('rated', beefBourguignon, 'timestamp', '2014-01-23T00:00:00.00Z', 'stars', 4)
janeDoe.addEdge('rated', beefBourguignon, 'timestamp', '2014-02-01T00:00:00.00Z', 'stars', 5, 'comment', 'Yummy!')
sharonSmith.addEdge('rated', beefBourguignon, 'timestamp', '2015-01-01T00:00:00.00Z', 'stars', 3, 'comment', 'It was okay.')
johnDoe.addEdge('rated', spicyMeatLoaf, 'timestamp', '2015-12-31T10:56:00.00Z', 'stars', 4, 'comment', 'Really spicy - be careful!')
sharonSmith.addEdge('rated', spicyMeatLoaf, 'timestamp', '2014-07-23T00:30:00.00Z', 'stars', 3, 'comment', 'Too spicy for me. Use less garlic.')
janeDoe.addEdge('rated', carrotSoup, 'timestamp', '2015-12-30T01:20:00.00Z', 'stars', 5, 'comment', 'Loved this soup! Yummy vegetarian!')
Run
the script by first identifying the script, and then remotely executing
it.gremlin> :load /tmp/generateReviews.groovy
The recipes that were previously entered are queried to assign the result to recipe
variables. The variables are then used to create the reviewer-recipe edges. These queries make
use of two Apache TinkerPop methods, tryNext()
and
orElseGet()
; see the Apache TinkerPop Java API for more information.
Exploring recipe ratings
reviewer
label.gremlin> g.V().hasLabel('reviewer').count()
==>5
List all the
reviewer using
values
:// Get the names of all the reviewers
gremlin> g.V().hasLabel('reviewer').values('name')
==>John Smith
==>Sharon Smith
==>Betsy Jones
==>Jane Doe
==>John Doe
Verifying that the reviewers are created is useful, but creating traversals that answer queries is more important. For instance, what does John Doe say about recipes?
g.V().has('reviewer', 'name','John Doe').outE('rated').values('comment')
The
use of the outgoing edges command outE('rated')
to find all the recipes
that John Doe has rated allows the value of the property comments
to be
retrieved:==>Pretty tasty!
==>Really spicy - be careful!
It might be nice to know which recipes John Doe reviewed, so another traversal can be used.
g.V().has('reviewer', 'name','John Doe').outE('rated').inV().values('name')
resulting
in:==>Beef Bourguignon
==>Spicy Meatloaf
gt(3)
, or greater than 3 to filter the
stars
values:gremlin> g.E().hasLabel('rated').has('stars', gt(3)).valueMap()
==>[stars:4, timestamp:2014-01-23T00:00:00Z]
==>[comment:Loved this soup! Yummy vegetarian!, stars:5, timestamp:2015-12-30T00:00:00Z]
==>[comment:Yummy!, stars:5, timestamp:2014-02-01T00:00:00Z]
==>[comment:Pretty tasty!, stars:5, timestamp:2014-01-01T00:00:00Z]
==>[comment:Really spicy - be careful!, stars:4, timestamp:2015-12-31T00:00:00Z]
The
traversal shown finds each edge that is labeled rated
and filters the edges
found to output only those edges with a star
rating of 4 or 5. But this
traversal doesn't output the answer to the original question. The traversal needs
modification to get the incoming vertices with inV()
, and to list those
incoming vertices by name with
values('name')
:gremlin> g.E().hasLabel('rated').has('stars', gt(3)).inV().values('name')
==>Beef Bourguignon
==>Spicy Meatloaf
==>Beef Bourguignon
==>Carrot Soup
==>Beef Bourguignon
The
results indicate that Beef Bourguignon
has been rated three times, although
we don't have any reviewer information, just duplication of the recipe title in the
list.timestamp
can find the 4 and 5 star ratings
using gte(4)
or greater than or equal to 4, with a review date of
Jan 1, 2015 or later.
gremlin> g.E().hasLabel('rated').has('stars',gte(4)).has('timestamp', gte(Instant.parse('2015-01-01T00:00:00.00Z'))).valueMap()
==>[comment:Loved this soup! Yummy vegetarian!, timestamp:2015-12-30T00:00:00Z, stars:5]
==>[comment:Really spicy - be careful!, timestamp:2015-12-31T00:00:00Z, stars:4]
Chaining
traversal steps together can yield very exacting results. For instance, if we added the
inV().values('name')
to the last query, we'd now refine the results to
find all 4-5 star reviews since the beginning of the year 2015.gremlin> g.E().hasLabel('rated').values('stars').mean()
==>4.142857142857143
The results show that the reviewers like the recipes they reviewed, and establishes that
reviewers in this sample did not write reviews for recipes that they did not like.gremlin> g.V().hasLabel('reviewer').map(outE('rated').count()).max()
==>2
This
traversal maps all the outgoing edges using outE('rated')
of each reviewer
and counts them, then determines which count has the highest value using
max()
.Another measure that can be investigated is the mean rating of each reviewer. This traversal query uses a number of Apache TinkerPop traversal steps.
as()
step allows display labels to be created for the two items that
will be lists, the reviewer's name and the mean stars
value for each
reviewer. These display labels, reviewer
and starCount
are
then used in a select()
step that gets each value, first the reviewer's
name using by('name')
and then the starCount
using
by(outE('rated').values('stars').mean()
. The select()
step checks each reviewer
vertex and then traverses to discover the
associated starCount
value.gremlin> g.V().hasLabel('reviewer').as('reviewer','starCount').
select('reviewer','starCount').
by('name').
by(outE('rated').values('stars').mean())
==>[reviewer:Jane Doe, starCount:5.0]
==>[reviewer:Betsy Jones, starCount:NaN]
==>[reviewer:John Doe, starCount:4.5]
==>[reviewer:John Smith, starCount:4.0]
==>[reviewer:Sharon Smith, starCount:3.0]
Notice
that Betsy Jones is listed as a reviewer, but has not reviewed any recipes. Her
starCount
lists NaN (not a number). It is clear from the results that
Jane Doe really likes at least one recipe, while Sharon Smith does not.starCount
, or mean star rating, can allow the
highest rater and the lowest rater to be discovered. Here, the traversal steps
order().by(select('starCount').decr()
use the output of the
select('starCount')
step to order the display in decremental
order.gremlin> g.V().hasLabel('reviewer').as('reviewer','starCount').
select('reviewer','starCount').
by('name').
by(outE('rated').values('stars').mean()).
order().by(select('starCount'), decr)
==>[reviewer:Betsy Jones, starCount:NaN]
==>[reviewer:Jane Doe, starCount:5.0]
==>[reviewer:John Doe, starCount:4.5]
==>[reviewer:John Smith, starCount:4.0]
==>[reviewer:Sharon Smith, starCount:3.0]
Betsy
Jones and her lack of ratings still cause the listing to be incorrect. We could add a
traversal step limit(1)
to the traversal and get the highest rater, Jane
Doe, if Betsy were not listed. coalesce()
, is used to change NaN to a zero
value.gremlin> g.V().hasLabel('reviewer').as('reviewer','starCount').
select('reviewer','starCount').
by('name').
by(coalesce(outE('rated').values('stars'),constant(0)).mean()).
order().by(select('starCount'), decr)
==>[reviewer:Jane Doe, starCount:5.0]
==>[reviewer:John Doe, starCount:4.5]
==>[reviewer:John Smith, starCount:4.0]
==>[reviewer:Sharon Smith, starCount:3.0]
==>[reviewer:Betsy Jones, starCount:0.0]
Note
that now Betsy Jones has a starCount
of 0.0
, the true
value.g.V().hasLabel('reviewer').as('reviewer','rating').out().as('recipe').
select('reviewer','rating','recipe').
by('name').
by(outE('rated').values('stars')).
by(values('name'))
Note
how the recipe name is traversed and named with the step modulator
as('recipe')
after the reviewer and rating are labeled from the reviewer
vertices with as('reviewer','rating')
. The first two items in the output
listing are retrieved starting at the reviewer vertex while the third item is retrieved from
the adjacent recipe
vertex.==>{reviewer=John Doe, rating=5, recipe=Beef Bourguignon}
==>{reviewer=John Doe, rating=5, recipe=Spicy Meatloaf}
==>{reviewer=John Smith, rating=4, recipe=Beef Bourguignon}
==>{reviewer=Jane Doe, rating=5, recipe=Beef Bourguignon}
==>{reviewer=Jane Doe, rating=5, recipe=Carrot Soup}
==>{reviewer=Sharon Smith, rating=3, recipe=Beef Bourguignon}
==>{reviewer=Sharon Smith, rating=3, recipe=Spicy Meatloaf}
inE('rated').count()
, and the mean value of the incoming edges with
inE('rated').values('stars').mean()
. The coalesce()
step
shown earlier could be used to change all NaN
values for
meanRating
into
zeroes.g.V().hasLabel('recipe').as('recipe','numberOfReviews','meanRating').
select('recipe','numberOfReviews','meanRating').
by('name').
by(inE('rated').count()).
by(inE('rated').values('stars').mean())
==>{recipe=Beef Bourguignon, numberOfReviews=4, meanRating=4.25}
==>{recipe=Wild Mushroom Stroganoff, numberOfReviews=0, meanRating=NaN}
==>{recipe=Spicy Meatloaf, numberOfReviews=2, meanRating=3.5}
==>{recipe=Rataouille, numberOfReviews=0, meanRating=NaN}
==>{recipe=Salade Nicoise, numberOfReviews=0, meanRating=NaN}
==>{recipe=Roast Pork Loin, numberOfReviews=0, meanRating=NaN}
==>{recipe=Oysters Rockefeller, numberOfReviews=0, meanRating=NaN}
==>{recipe=Carrot Soup, numberOfReviews=1, meanRating=5.0}
Searching recipes
g.V().hasLabel('recipe').out().has('name','beef').in().hasLabel('recipe').values('name')
==>Beef Bourguignon
A
modification allows a query that includes either one ingredient or
another.g.V().hasLabel('recipe').out().has('name',within('beef','carrots')).in().hasLabel('recipe').values('name')
==>Beef Bourguignon ==>Carrot Soup
g.V().match(
__.as('a').hasLabel('ingredient'),
__.as('a').in('includes').has('name','Beef Bourguignon')).
select('a').by('name')
This
query uses a match()
step to find a match for the ingredients used to make
Beef Bourguignon. The traversal starts by filtering all vertices to find the ingredients,
then traverses to the recipe vertices along the includes
edges using
in('includes')
. This query also uses a Groovy double underscore variable
as a private variable for the match method. The results
are:==>tomato paste
==>beef
==>onion
==>mashed garlic
==>butter
inside()
is most commonly used for geospatial searches, the
method can be used to find anything that falls within a particular range of values. An
example is finding books that have a publishing date between 1960 and
1970:g.V().has('book', 'year', inside(1960,1970)).valueMap()
The
results are:
==>{ISBN=[0-394-40135-2], year=[1968], name=[The French Chef Cookbook]}
==>{year=[1961], name=[The Art of French Cooking, Vol. 1]}
Grouping output
group()
traversal step. For
example, display all the vertices by name, grouped by
label:g.V().group().by(label).by('name')
Note that the meals,
ingredients, authors, books, recipes, and reviewers are all grouped in the
results:==>[meal:[JuliaDinner, Saturday Feast, EverydayDinner], ingredient:[olive oil, chicken broth,
eggplant, pork sausage, green bell pepper, yellow onion, celery, hard-boiled egg, shallots,
zucchini, butter, green beans, mashed garlic, onion, mushrooms, bacon, parsley, oyster,
tomato, thyme, pork loin, tuna, tomato paste, ground beef, red wine, fennel, Pernod,
chervil, egg noodles, carrots, beef], author:[Louisette Bertholie, Kelsie Kerr,
Alice Waters, Julia Child, Emeril Lagasse, Simone Beck, Patricia Curtan, Patricia Simon,
James Beard, Fritz Streiff], book:[Simca's Cuisine: 100 Classic French Recipes for Every
Occasion, The French Chef Cookbook, The Art of Simple Food: Notes, Lessons, and Recipes
from a Delicious Revolution, The Art of French Cooking, Vol. 1], recipe:[Wild Mushroom
Stroganoff, Roast Pork Loin, Spicy Meatloaf, Rataouille, Beef Bourguignon, Oysters
Rockefeller, Salade Nicoise, Carrot Soup], reviewer:[Sharon Smith, John Smith, Jane Doe,
Betsy Jones, John Doe]]
g.V().hasLabel('book').group().by('year').by('name')
and
lists:==>{1968=[The French Chef Cookbook, The French Chef Cookbook],
1972=[Simca's Cuisine: 100 Classic French Recipes for Every Occasion, Simca's Cuisine: 100
Classic French Recipes for Every Occasion], 2007=[The Art of Simple Food: Notes, Lessons,
and Recipes from a Delicious Revolution, The Art of Simple Food: Notes, Lessons, and Recipes
from a Delicious Revolution], 1961=[The Art of French Cooking, Vol. 1, The Art of French
Cooking, Vol. 1]}
Grouping for processing using local()
limit()
command to show how
local()
can change the processing from the whole stream entering the
query to a portion of the query. First, find just two authors and the year that they have
published
books:g.V().hasLabel('author').as('author').out().properties('year').as('year').
select('author','year').
by('name').
by().
limit(2)
This
query results in returning the first two records in the
database:==>{author=Julia Child, year=vp[year->1961]}
==>{author=Julia Child, year=vp[year->1968]}
local()
, change this query to find the first two books that each
author in the graph has
published:g.V().hasLabel('author').as('author').
local(out().properties('year').as('year').limit(2)).
select('author','year').
by('name').
by()
Note
that up to two books are displayed for each
author:==>{author=Julia Child, year=vp[year->1961]}
==>{author=Julia Child, year=vp[year->1968]}
==>{author=Simone Beck, year=vp[year->1961]}
==>{author=Simone Beck, year=vp[year->1972]}
==>{author=Louisette Bertholie, year=vp[year->1961]}
==>{author=Patricia Simon, year=vp[year->1972]}
==>{author=Alice Waters, year=vp[year->2007]}
==>{author=Patricia Curtan, year=vp[year->2007]}
==>{author=Kelsie Kerr, year=vp[year->2007]}
==>{author=Fritz Streiff, year=vp[year->2007]}
The traversal step local()
has many applications for processing a
subsection of a graph within a graph traversal to return results before moving on to further
processing.