The
use of the outgoing edges command outE('reviewed') to find all the recipes
that John Doe has rated allows the value of the property comments to be
retrieved:
==>Pretty tasty!
==>Really spicy - be careful!
It might be nice to know which recipes John Doe reviewed, so another traversal can be
created.Figure 2.
This query traverses from the comments to the
recipes:
The
use of inV() traverses to the incoming vertices that are connected to John
DOE by reviewed edges.
Another reasonable question is ask is: What are all the reviews that give a recipe more
than 3 stars? Try a traversal using gt(3), or greater than 3 to
filter the stars
values:
The
traversal shown finds each edge that is labeled reviewed and filters the
edges found to output only those edges with a star rating of 4 or 5. But
this traversal doesn't output the answer to the original question.
The traversal needs modification to get the incoming vertices with inV(),
and to list those incoming vertices by name with
values('name'):
The
results indicate that Beef Bourguignon has been reviewed three times,
although we don't have any reviewer information, just duplication of the recipe title in the
list.
Returning to the previous query, let's look for more recent reviews. Adding an additional
traversal step to filter by the time can find the 4 and 5 star ratings
using gte(4) or greater than or equal to 4, with a review date of
Jan 1, 2015 or later.
g.E().hasLabel('reviewed').
has('stars',gte(4)).
has('year', gte('2015-01-01' as LocalDate)).
has('time', '00:00:00' as LocalTime).
valuesMap()
results
in:
==>[comment:Loved this soup! Yummy vegetarian!, timestamp:2015-12-30T00:00:00Z, stars:5]
==>[comment:Really spicy - be careful!, timestamp:2015-12-31T00:00:00Z, stars:4]
Chaining
traversal steps together can yield very exacting results. For instance, if we added the
inV().values('name') to the last query, we'd now refine the results to
find all 4-5 star reviews since the beginning of the year 2015.
Turning in another direction with queries, let's take a look at using statistical
functions. For instance, what is the mean value of all the recipe
ratings?
g.E().hasLabel('reviewed').values('stars').mean()
results
in:
==>4.142857142857143
The results show that the reviewers like the recipes they reviewed (high mean), and
establishes that reviewers in this sample did not write reviews for recipes that they did
not like.
Perhaps a prolific reviewer would have a wider range of reviews. Find the maximum number of
reviews that a single reviewer has
written.
This
traversal maps all the outgoing 'reviewededges of each reviewer and counts
them, then determines which count has the highest value using max(). The
map() step allows the process to be done for each reviewer.
Another measure that can be investigated is the mean rating of each reviewer. This
traversal query uses a number of Apache TinkerPop traversal steps. Figure 3.
The as() step allows display labels to be created for the two items that
will be lists, the reviewer's name and the mean stars value for each
reviewer. These display labels, reviewer and starCount are
then used in a select() step that gets each value, first the reviewer's
name using by('name')and then the starCount using
by(outE('reviewed').values('stars').mean(). The select()
step checks each reviewer vertex and then traverses to discover the
associated starCount
value.
Notice
that people who have not reviewed any recipes are not included in the results. The key to
excluding non-reviewers is the where(out('reviewed')) that filters and
continues the traversal with only the people who have reviewed edges to
recipes.
Ordering the results by the starCount, or mean star rating, can allow the
highest rater and the lowest rater to be discovered. Here, the traversal steps
order().by(select('starCount').decr() uses the output of the
select('starCount') step to order the display in decremental
order.
If
we were interested in only returning the highest rater, we could add a traversal step
limit(1)to the traversal and get the highest rater, Jane DOE.
Suppose we want a list of all people, whether or not they have reviewed a recipe? A tricky
traversal step, coalesce(), is used to allow zero values by setting any
missing stars values to a constant of
zero:
Note
how the recipe name is traversed with out() and named with the step
modulator as('recipe') after the reviewer and rating are labeled from the
reviewer vertices with as('reviewer','rating'). Also, look at the
versatility of the select()...by() combination to get values from both
vertex and edge properties.
In general, the most interesting question answers how many people rated a particular
recipe, and what the mean rating is for that particular recipe.Most people looking to
discover recipes they want to make are looking for popular, well-rated recipes. The graph
traversal starts from a recipe vertex this time, and retrieves the recipe
name, the number of reviews by counting the incoming edges with
inE('reivewed').count(), and the mean value of the incoming edges with
inE('reviewed').values('stars').mean().
Looking
at the results, we see that Carrot Soup has the highest mean rating, but
only one review. Beef Bourguignon, on the other hand, has a pretty high
rating and a larger number of reviews. Note that we could modify this query to find all
recipes, even if the number of reviews are zero by including the coalesce()
step used in an earlier query on this page.
Searching recipes
A common query for recipes is finding recipes that contain a certain
ingredient:
This
query uses a match() step to find a match for the ingredients used to make
Beef Bourguignon. The traversal starts by filtering all vertices to find
the ingredients, then traverses to the recipe vertices along the includes
edges using in('includes'). This query also uses a Groovy double underscore
variable as a private variable for the match method. The fold() step is
used to put all the whole shopping list of ingredients into a single array.
Although inside() is most commonly used for geospatial searches, the method can be used to find anything that falls within a
particular range of values. An example is finding books that have a publishing date between
1960 and 1970 (represented by
integers):
==>{publish_year=[1961], name=[The Art of French Cooking, Vol. 1], book_id=[1001], category=[[French, cooking, general]]}
==>{publish_year=[1968], isbn=[0-394-40135-2], name=[The French Chef Cookbook], book_id=[1003], category=[[French, cooking]], book_discount=[10%]}
This
is useful to discover any records within a range of values.
Grouping output
Group output from a graph traversal using the group() traversal step. For
example, display all the vertices by name, grouped by vertex
label:
g.V().group().by(label).by('name')
results
in:
The property does not exist as the key has no associated value for the provided element:
v[dseg:/fridge_sensor/45/300/66665/1]:name
Wait
- why did this query get an error? Looking at the error message, there is at least one
vertex label that doesn't use the property name. How can this query be
modified? It is possible to get all vertex labels that do not include name
using the without()
step:
==>meal_item=[taco, iced tea, burrito]
==>ingredient=[celery, carrots, butter, mashed garlic, egg noodles, fennel, thyme, ground beef, onion, tomato paste, eggplant, beef, pork loin, olive oil, yellow onion, tuna, mushrooms, oyster, tomato, Pernod, green beans, shallots, red wine, green bell pepper, pork sausage, parsley, hard-boiled egg, chervil, bacon, zucchini, chicken broth]
==>person=[James BEARD, Fritz STREIFF, John Smith, Kelsie KERR, Jane DOE, Sharon SMITH, Alice WATERS, Simone BECK, Julia CHILD, Louisette BERTHOLIE, Emeril LAGASSE, Betsy JONES, Patricia CURTAN, Patricia SIMON, John DOE]
==>book=[The Art of Simple Food: Notes, Lessons, and Recipes from a Delicious Revolution, The Art of French Cooking, Vol. 1, The French Chef Cookbook, Simca's Cuisine: 100 Classic French Recipes for Every Occasion]
==>recipe=[Salade Nicoise, Spicy Meatloaf, Oysters Rockefeller, Carrot Soup, Beef Bourguignon, Rataouille, Wild Mushroom Stroganoff, Roast Pork Loin]
==>location=[Dublin, London, Jane's house, New York, Los Angeles, New Orleans, Zippy Mart, Tokyo, Paris, Mamma's Grocery, test location, Quik Station, Aachen, John Smith's place, Mary's house, Chicago]
==>store=[Zippy Mart, Quik Station, Mamma's Grocery]
==>home=[Jane's house, John Smith's place, Mary's house]
While
two of the vertex labels are not included, the rest are displayed nicely, with the different
vertex types grouped together. We could get all the vertex label information, including the
fridge sensors and meals by deleting the by('name') step. However, the
resulting data will display the vertex ids, a less readable result. Try to see that you get!
Similarly, you can group by edge label, replacing V() with
E(), but edge labels rarely have names, and the resulting return can be
useful but messy to read.
Another example groups all books by publishing year and displays a listing of each year
books were published followed by the book
titles:
==>1968=[The French Chef Cookbook]
==>1972=[Simca's Cuisine: 100 Classic French Recipes for Every Occasion]
==>2007=[The Art of Simple Food: Notes, Lessons, and Recipes from a Delicious Revolution]
==>1961=[The Art of French Cooking, Vol. 1]
Grouping for processing using local()
Oftentimes, it is critical to do local processing for a particular step in the graph
traversal. The next two examples use the limit() command to show how
local() can change the processing from the whole stream entering the
query to a portion of the query. First, find just two authors and the year that they have
published
books:
Note
that up to two books are displayed for each author. The traversal step
local() has many applications for processing a subsection of a graph
within a graph traversal to return results before moving on to further processing.
User-defined type (UDT) retrieval without search indexes
UDTs fields can be retrieved with queries, but not modified. Using
unfold() allows the UDTs to be traversed but not mutated. For the schema
for two UDTs plus a vertex label that uses
them:
==>{address1:'757 Jay St',address2:NULL,city_code:'Arbuckle',state_code:'CA',zip_code:'95691'}
==>{address1:'213 F St',address2:NULL,city_code:'Winston',state_code:'CA',zip_code:'93001'}
==>{address1:'1000 A St',address2:NULL,city_code:'Winston',state_code:'CA',zip_code:'93001'}
==>{address1:'500 C St',address2:NULL,city_code:'Winston',state_code:'CA',zip_code:'93001'}
Using values() along with unfold() query will instead,
retrieve only the city_code
values: