DSE Graph QuickStart using DataStax Studio or Gremlin console.
QuickStart Introduction
QuickStart Introduction
Graph databases are useful for discovering simple and complex relationships between objects.
Relationships are fundamental to how objects interact with one another and their environment.
Graph databases perfectly represent the relationships between objects.
Graph databases consist of three elements:
vertex
A vertex is an object, such as a person, location, automobile, recipe, or anything
else you can think of as nouns.
edge
An edge defines the relationship between two vertices. A person can create software,
or an author can write a book. Typically an edge is equivalent to a verb.
property
A key-value pair that describes some attribute of either a vertex or an edge. A
property key is used to describe the key in the key-value pair. All properties are
global in DSE Graph, meaning that a property can be used for any vertices. For example,
"name" can be used for all vertices in a graph.
Vertices, edges, and properties can have properties; for this reason, DSE Graph is
classified as a property graph. The properties for elements are an important element of
storing and querying information in a property graph.
Property graphs are typically quite large, although the nature of querying the graph varies
depending on whether the graph has large numbers of vertices, edges, or both vertices and
edges. To get started with graph database concepts, a toy graph is used for simplicity.
The example used here explores the world of food.
Figure 1. Recipe Toy Graph
Elements are labeled to distinguish the type of vertices and edges in a graph database using
vertex labels and edge labels. A vertex labeled person holds
information about an author or reviewer or someone who ate a meal. An edge between an
person and a book is labeled authored. Specifying appropriate labels is
an important step in graph data modeling.
Vertices and edges generally have properties. For instance, a person vertex can have
properties name and gender. Edges can also have properties. A created
edge can have a createDate property that identifies when the adjoining recipe
vertex was created.
Information in a graph database is retrieved using graph traversals. Graph
traversalswalk a graph with a single or series of traversal steps from a defined starting
point and filter each step until returning a result.
To retrieve information using graph traversals, you must first insert data. The steps listed
in this section allow you to gain a rudimentary understanding of DSE Graph with a minimum
amount of configuration and schema creation.
\,,,/
(o o)
-----oOOo-(3)-oOOo-----plugin activated: tinkerpop.tinkergraph
plugin activated: tinkerpop.server
plugin activated: tinkerpop.utilities
==>Connected - localhost/127.0.0.1:8182-[4edf75f9-ed27-4add-a350-172abe37f701]
==>Set remote timeout to 2147483647ms
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[4edf75f9-ed27-4add-a350-172abe37f701] - type ':remote console' to return to local mode
gremlin>
Gremlin console sends all commands typed at the prompt to the Gremlin
Server that will process the commands. DSE Graph runs a Gremlin
Server tinkerpop.server on each DSE
node. Gremlin console automatically connects to the Gremlin
Server.
The Gremlin console runs in remote mode automatically,
processing commands on the Gremlin server. The Gremlin console by
default opens a session to run commands on the remote server. The
Gremlin console can be switched to run commands locally
using:
:remote console
All commands will need
to be submitted remotely once this command is run. Using the command
again will switch the context back to the Gremlin server.
QuickStart Configuration
Configure DSE Graph to run QuickStart.
Procedure
Create a Studio notebook and configure a graph for the QuickStart. If you are
using Gremlin console, skip to this step.
This tutorial exists as a Studio notebook, DSE Graph QuickStart,
so that you do not have to create a notebook. However, in Studio, creating a notebook is simple.
If running Studio on a DSE node, the default connection of localhost
works, otherwise create a
connection for the DSE cluster desired. Each notebook is
connected to a particular graph. Multiple notebooks can be connected to
the same graph, or multiple notebooks can be created to connect to
different graphs.
A connection in Studio defines the graph and assigns a graph traversal
g for that graph. A graph traversal is the mechanism for visiting
each vertex in a graph, based on the filters defined in the graph traversal.
To query DSE Graph, the graph traversal g must be assigned to a
particular graph; Studio manages this assignment with connections.
A blank notebook opens with a single cell. DSE Graph runs a Gremlin
Server tinkerpop.server on each DataStax Enterprise
node. Studio automatically connects to the Gremlin Server, and if it
doesn't exist, it creates a graph using the connection information.
The graph is stored as one graph instance per DSE database
keyspace. Once a graph exists, a graph traversal source g is
configured that allows graph traversals to be executed to query the
graph. A graph traversal is bound to a specific traversal source,
which by default is the standard OLTP traversal engine. The
graph commands can add vertices and edges to the
database, or get other graph information. The g commands can
query or add vertices and edges.
Set the schema mode to Development and allow full scans.
CAUTION: Development is a more lenient mode that allows
schema to be created automatically when adding data, and also allows
full scans that can inspect the data with broad graph traversals.
Full scans over large graphs will have high read latency, and are
not appropriate for production applications. For production, the
schema mode should be set to Production to require schema
prior to inserting data and disallow full scans.
Create a graph in Gremlin Console and configure the graph for the
QuickStart.
Create a graph to hold the data. The system commands are used to
run commands that affect graphs in DSE Graph.
system.graph('test').create()
==>null
Once a graph exists, a graph traversal g is configured that
will allow graph traversals to be executed. Graph traversals are
used to query the graph data and return results. A graph traversal
is bound to a specific traversal source which is the standard OLTP
traversal engine.
Configure a graph traversal g to use the default graph
traversal setting, which is test.g. This step will also
create an implicit graph object.
:remote config alias g test.g
==>g=test.g
The graph commands allow graphs to be written to file, add
vertices, properties, or edges to the database, and set or get other
graph configuration. The g commands create queries to obtain
results, and can also add vertices, properties, or edges to the
database.
Set the schema mode to Development and allow full scans.
CAUTION: Development is a more lenient mode that allows
schema to be created automatically when adding data, and also allows
full scans that can inspect the data with broad graph traversals.
Full scans over large graphs will have high read latency, and are
not appropriate for production applications. For production, the
schema mode should be set to Production to require schema
prior to inserting data and disallow full scans.
When creating a new graph, to check what graphs already exist,
use:
system.graphs()
==>test
==>anotherTest
QuickStart Vertex and edge counting
Methods for counting vertices and edges in DSE Graph.
There are different methods for accomplishing vertex and edge counts in DSE Graph.
Examples here will show how to use the Gremlin count() command either as a
transactional or analytical query, and Spark SQL for analytical queries.
A transactional Gremlin query can be used to check the number of vertices that exist
in the graph, and are useful for exploring small graphs. However, such a query scans
the full graph, traversing every vertex, and should not be run on large
graphs! If multiple DSE nodes are configured, this traversal step intensively walks
all partitions on all nodes in the cluster that have graph data. This method is not
appropriate for Production operations.
An analytical Gremlin query can be used to check the number of vertices that exist in
any graph, large or small, and are much safer for Production operations. The queries
will be written like transactional Gremlin queries, but executed with the analytic
Spark engine.
Spark SQL provides another query method for counting vertices in transactional graph
traversals. If the AlwaysOnSQL service is turned on, Studio uses the
JDBC interface to pass queries to DSE Analytics. Two tables,
graphName_vertices and graphName_edges, are automatically
generated in the Spark database dse_graph for each graph, where
graphName is replaced with the graph used for the Studio connection
assigned to a Studio notebook. These tables can be queried with common Spark SQL
commands directly in Studio, or can be explored with the dse spark-sql shell. To learn more
about using Spark SQL to query, see the Using Spark
SQL to query data documentation.
Procedure
Transactional Gremlin count()
Use the traversal step count(); the current count will be
zero, because no data exists yet. A graph traversal g is chained with
V() to retrieve all vertices and count() to compute the number
of vertices. Chaining executes sequential traversal steps in the most efficient
order.
g.V().count()
Analytical Gremlin count()
To use Gremlin console, configure the traversal to run an analytical
query:
:remote config alias g test.a
where test.a denotes that the graph will be used for analytic
purposes.
To use Studio, configure the Run option to "Execute using analytic engine
(Spark)" before running the query.
Use the traversal step count(); the current count will be
zero, because no data exists yet. A graph traversal g is chained with
V() to retrieve all vertices and count() to compute the number
of vertices. Chaining executes sequential traversal steps in the most efficient
order.
g.V().count()
Spark SQL count
Enable AlwaysOn SQL or start a Spark SQL Thrift server instance.
To use Spark SQL in Studio, enable AlwaysOn SQL service in the dse.yaml file
by setting the option to true and restart
DSE:
# AlwaysOn SQL options
alwayson_sql_options:
# If it's true, the node is enabled for AlwaysOn SQL. Only Analytics node
# can be enabled as a AlwaysOn SQL node
enabled: true
In
a Studio cell, select Spark SQL in the language menu in a cell and
set the database to dse_graph.
To use the Spark SQL shell, start the
shell:
dse spark-sql
and
navigate to the correct database:
USE dse_graph;
Then, in either Studio or the Spark SQL shell, execute the Spark SQL query for
finding the vertex count:
SELECT count(*) FROM DSE_GRAPH_QUICKSTART_vertices;
Edge counts
To do an edge count with Gremlin, replace V() with
E():
g.E().count()
To do an edge count with Spark SQL, replace the word vertices in the
table name with edges:
SELECT count(*) FROM DSE_GRAPH_QUICKSTART_edges;
QuickStart Simple example
Simple DSE Graph example.
Let's start with a simple example from the recipe data model. The data is composed of
two vertices, one person who is an author (Julia Child) and one book (The
Art of French Cooking, Vol. 1) with an edge between them to identify that
Julia Child authored that book. Although we could make this graph without schema,
and DSE Graph would make a best guess about the data types, we'll supply schema
before inserting the graph data.
Next graph.addVertex is used to add data for a single vertex. Note the use of
label to designate the vertex label. A g.addV statement could also be used,
as shown in the alternate method.
Run the command and look at the results using the buttons to display the Raw JSON,
Table, and Graph views
Procedure
Schema is defined for properties personId, name, and
gender. Properties should be created first, before vertex labels. A
vertex label person identifies a partitionKey personId using an
user-defined vertex id with a single partitionKey; personId is an integer
for simplicity in this example. The schema to add the partitionKey and
properties are executed with two statements, but could be executed as a single
chained statement.
The user-defined vertex id is used to partition the graph data amongst the
cluster's nodes (more information).
User-defined vertex (UDV) ids are strongly recommended, although
auto-generated vertex ids are also available, but deprecated in DSE 6.0,
with warnings logged when using auto-generated vertex ids. [(add a link
here)](link info)
As you will see in the schema for a book vertex label, a property key
can be reused for different types of information. While properties are
“global” in the sense that they can be used with multiple vertex labels, it
is important to understand that when specifying a property in a graph
traversal, it is always used in conjunction with a vertex label.
First, insert a vertex for Julia Child using a
graph.addVertex() command. The vertex label is
person and two property key-value pairs are created for name
and gender. Note that a label designates the key for a key-value pair
that sets the vertex label.
Performance
tests show that the graph.addVertex() is faster, but the
g.addV can be used in applications using DSE Drivers.
The Studio result:
Tip: In Studio, the result can be displayed using different views:
Raw JSON, Table, or Graph. Explore the options.
The Gremlin console
result:
==>v[{~label=person, personId=1}]
Create the schema for a vertex label book that has an user-defined
vertex id single partitionKey bookId and includes the properties
name, publishYear, and ISBN.
artOfFrenchCookingVolOne = graph.addVertex(label, 'book', 'bookId', 1001, 'name', 'The Art of French Cooking, Vol. 1', 'year', 1961)
or
optionally, the traversal
query:
artOfFrenchCookingVolOne = g.addV('book').property('bookId', 1001).property('name','The Art of French Cooking, Vol. 1').property('publishYear', 1961)
The Studio result:
As with the author vertex, you can see all the information about the book
vertex created. In Graph view, use the
Settings button (the gear) to change the display
label for author by entering Chef {{name}}. Change the book
display label with {{label}}:{{name}}. Change the book
display label with {{{name}}}. To set graph display names
more generally, look for “Configure Graph Display Names” under the three
bars in the upper lefthand corner of Studio.
The first query uses a variable juliaChild to hold the
person vertex information, while the second query uses the variable
artOfFrenchCookingVolOne to hold the book vertex
information. The third query uses a graph traversal
g.V(firstVertex).addE(edgeLabel).to(secondVertex)to
create the edge between the author and book vertices.
Ensure that the data inserted for the author is correct by checking with a
has() step using the vertex label person
and the property name = Julia Child. This graph traversal is a
basic starting point for more complex traversals, because it narrows the search
of the graph with specific information.
g.V().has('person', 'name', 'Julia Child')
In Studio, use the Table view to look at the
results, as it is much more readable than the Raw JSON
view.
The vertex information is displayed for the person vertex for
Julia Child. Note the id consists of the label and the
user-defined vertex id personId.
The Gremlin console
result:
==>v[{~label=person, personId=1}]
Another useful traversal is valueMap(), which prints the
key-value listing of each property value for specified vertices.
g.V().hasLabel('person').valueMap()
CAUTION: Using valueMap()
without specifying properties can result in slow query latencies, if a large
number of property keys exist for the queried vertex or edge. Specific
properties can be specified, such as
valueMap('name').
Although Spark SQL is used more for analytical queries, simple queries similar
to Gremlin can be made, such as querying information about vertices. A query can
look for specific columns for a specific vertex label, in this case, a person
with the name Julia Child. Notice the use of backticks to escape the
tilde in the column name ~label and name.
SELECT personid,name,gender FROM DSE_GRAPH_QUICKSTART_vertices WHERE `~label` = 'person' AND `name` = 'Julia Child';
QuickStart Key features
Key features of DSE Graph.
A vertex labelperson specifies the type of vertex, personId provides a user-defined
vertex id to manage cluster storage of the vertex, and the property keysname and gender display the properties for a person. Creating vertex labels
explains the id components.
Procedure
A useful traversal is valueMap() which prints the key-value listing of
each property value for specified vertices.
g.V().hasLabel('person').valueMap()
CAUTION: Using valueMap()
without specifying properties can result in slow query latencies, if a
large number of property keys exist for the queried vertex or edge.
Specific properties can be specified, such as
valueMap('name').
If only the value of a particular property key is desired, the values()
traversal step can be used. To get the name of all vertices, use:
g.V().values('name')
Edge information may also be retrieved. The next command filters all edges to
find those with an edge labelauthored.
g.E().hasLabel('authored')
The Raw JSON view of the edge information displays details about the incoming
and outgoing vertices as well as edge parameters id, label,
and type.
Spark SQL can also be used to find information about edges. Notice that the
Spark-generated tables display different information than the Gremlin graph
query. The traversal step count() is useful for counting both the number
of vertices and the number of edges. To count edges, use E() rather than
V(). You should have one edge. The same cautions apply about
real-time transactional uses in Production - Spark SQL count or OLAP execution,
both analytical actions, will be a better choice!
SELECT * FROM DSE_GRAPH_QUICKSTART_edges;
QuickStart Graph schema
Set graph schema.
Before adding more data to the graph, let's stop and talk about schema. Schema
defines the possible properties and their data types for the graph. These properties
are then used in the definitions of vertex labels and edge labels. The last critical
step in schema creation is index creation. Indexes play an important role in making
graph traversals efficient and fast. See creating
schema and creating indexes for more
information.
First, let's create schema for the property keys. In the next two cells, the first
command clears the schema for the previously created vertices and edge. After the
schema creation is completed, the next step is to enter data for those elements
again in a longer script.
Note: DSE Graph has two schema modes, Production and Development. In Production mode,
all schema must be identified before data is entered. In Development mode, schema
can be created or modified after data is entered.
Procedure
Clear the schema:
schema.drop()
To keep the Spark SQL data synchronized with the graph, drop the Spark SQL
tables. The tables will be automatically rebuilt, so that the data will align
with the graph schema and data entered later.
DROP TABLE DSE_GRAPH_QUICKSTART_vertices;
DROP TABLE DSE_GRAPH_QUICKSTART_edges;
Each property must be defined with a data
type. DSE Graph data types are aligned with the DSE database data
types. By default, properties have single cardinality, but can be defined
with multiple cardinality. Multiple
cardinality allows more than one value to be assigned to a property.
In addition, properties can have their own properties, or
meta-properties. Meta-properties can only be nested one deep, and
are useful for keying information to an individual property. Notice that
property keys can be created with an additional method
ifNotExists(). This method prevents overwriting a
definition that can already exist.
Vertex label schema
After property keys are created, vertex labels can be defined.
The
schema for vertex labels defines the label type, and optionally
defines the properties associated with the vertex label. There are two
different methods for defining the association of the properties with vertex
labels, either during creation, or by adding them after vertex label
addition. The ifNotExists() method can be used for any schema
creation.
Vertex ids should be user-defined (UDV)
ids, as auto-generated vertex ids are deprecated in DSE 6.0. UDV ids are
explained in further detail in the documentation, but note that partition
keys and clustering keys may be defined.
DSE Graph limits the number of vertex labels to 200 per
graph.
ß
Edge label schema
After property keys are created, edge labels can be defined.
The schema for edge labels defines the label type, and defines the two
vertex labels that are connected by the edge label with
connection(). The reviewed edge label
defines edges between adjacent vertices with the outgoing vertex label
person and the incoming vertex label
recipe. By default, edges have multiple
cardinality, but can be defined with single cardinality. Multiple
cardinality allows more than one edge with differing property values but the
same edge label to be assigned.
QuickStart Indexing
Index graph schema.
Indexing is a complex and highly important subject.
Here, several types of indexes are created. Briefly, secondary and materialized
indexes are two types of indexes that use the DSE database built-in indexing. Search
indexes use DSE Search which is Solr-based. Only one search index per vertex label
is allowed, but multiple properties can be included. Property indexes allow
meta-properties to be indexed. Edge indexes allow properties on edges to be indexed.
Note that indexes are added with add() to previously created vertex
labels.
The schema.describe() query displays schema you can use to recreate
the schema entered. If you enter data without creating schema, you can use this
command verify the data types set for each property.
Procedure
Examine the schema:
schema.describe()
In Studio, a portion of the output:
The schema.describe() query displays schema you can use to
recreate the schema entered. If you enter data without creating schema, you
can use this command verify the data types set for each property. While
entering data without schema creation is handy while developing and
learning, it is strongly recommended against for actual applications. As a
reminder, Production mode disallows schema creation once data is loaded.
Some groovy steps are useful in the Gremlin query to find specific schema
descriptions. For instance, to find only the schema for vertex labels and their
indexes, use the following command:
Additional steps can split the output per newline and grep for a string as
shown for index. The Gremlin variant used here is based on
Apache Groovy, so any Groovy commands can be used to manipulate
graph traversals. Apache Groovy is a language that smoothly integrates with
Java to provide scripting capabilities.
QuickStart Modifying schema
Modify graph schema.
Schema can be modified after creation, using schema add() to add additional
properties, vertex labels, edge labels, or indexes, as shown in the schema creation
above. The drop() step can also be used to remove any element; see propertyKey, vertexLabel, and edgeLabel. The data
type of a property, however, cannot be changed, without removing and recreating the
property. While entering data without schema creation is useful when developing and
learning, it is strongly recommended against for actual applications. As a reminder,
Production mode disallows schema creation once data is loaded.
Now that schema is created, add more vertices and edges using the following script.
To explore more connections in the recipe data model, more vertices and edges are
input into the graph. A script, generateRecipe.groovy, is entered and then
executed by the remote Gremlin server. Note the first command,
g.V().drop().iterate(); this command can be used to drop all
vertex and edge data from the graph before reading in new data. In Studio, be sure
to select the Graph view after running the script.
Procedure
Adding more data
Run generateRecipe.groovy in either Studio or the Gremlin console:
If running in Gremlin console, use the following command to
load:
:load /tmp/generateRecipe.groovy
replacing "/tmp" with the directory where you write the script. In Studio,
run the script within a cell.
// Generates all Recipe Toy Graph vertices and edges except Reviews
// Add all vertices and edges for Recipe
g.V().drop().iterate()
// author vertices
juliaChild = graph.addVertex(label, 'person', 'personId', 1, 'name','Julia Child', 'gender', 'F')
simoneBeck = graph.addVertex(label, 'person', 'personId', 2, 'name', 'Simone Beck', 'gender', 'F')
louisetteBertholie = graph.addVertex(label, 'person', 'personId', 3, 'name', 'Louisette Bertholie', 'gender', 'F')
patriciaSimon = graph.addVertex(label, 'person', 'personId', 4, 'name', 'Patricia Simon', 'gender', 'F')
aliceWaters = graph.addVertex(label, 'person', 'personId', 5, 'name', 'Alice Waters', 'gender', 'F')
patriciaCurtan = graph.addVertex(label, 'person', 'personId', 6, 'name', 'Patricia Curtan', 'gender', 'F')
kelsieKerr = graph.addVertex(label, 'person', 'personId', 7, 'name', 'Kelsie Kerr', 'gender', 'F')
fritzStreiff = graph.addVertex(label, 'person', 'personId', 8, 'name', 'Fritz Streiff', 'gender', 'M')
emerilLagasse = graph.addVertex(label, 'person', 'personId', 9, 'name', 'Emeril Lagasse', 'gender', 'M')
jamesBeard = graph.addVertex(label, 'person', 'personId', 10, 'name', 'James Beard', 'gender', 'M')
// book vertices
artOfFrenchCookingVolOne = graph.addVertex(label, 'book', 'bookId', 1001, 'name', 'The Art of French Cooking, Vol. 1', 'publishYear', 1961)
simcasCuisine = graph.addVertex(label, 'book', 'bookId', 1002, 'name', "Simca's Cuisine: 100 Classic French Recipes for Every Occasion", 'publishYear', 1972, 'ISBN', '0-394-40152-2')
frenchChefCookbook = graph.addVertex(label, 'book', 'bookId', 1003, 'name','The French Chef Cookbook', 'publishYear', 1968, 'ISBN', '0-394-40135-2')
artOfSimpleFood = graph.addVertex(label, 'book', 'bookId', 1004, 'name', 'The Art of Simple Food: Notes, Lessons, and Recipes from a Delicious Revolution', 'publishYear', 2007, 'ISBN', '0-307-33679-4')
// recipe vertices
beefBourguignon = graph.addVertex(label, 'recipe', 'recipeId', 2001, 'name', 'Beef Bourguignon', 'instructions', 'Braise the beef. Saute the onions and carrots. Add wine and cook in a dutch oven at 425 degrees for 1 hour.', 'notes', 'Takes a long time to make.')
ratatouille = graph.addVertex(label, 'recipe', 'recipeId', 2002, 'name', 'Rataouille', 'instructions', 'Peel and cut the egglant. Make sure you cut eggplant into lengthwise slices that are about 1-inch wmyIde, 3-inches long, and 3/8-inch thick', 'notes', "I've made this 13 times.")
saladeNicoise = graph.addVertex(label, 'recipe', 'recipeId', 2003, 'name', 'Salade Nicoise', 'instructions', 'Take a salad bowl or platter and line it with lettuce leaves, shortly before serving. Drizzle some olive oil on the leaves and dust them with salt.', 'notes', '')
wildMushroomStroganoff = graph.addVertex(label, 'recipe', 'recipeId', 2004, 'name', 'Wild Mushroom Stroganoff', 'instructions', 'Cook the egg noodles according to the package directions and keep warm. Heat 1 1/2 tablespoons of the oliveoil in a large saute pan over medium-high heat.', 'notes', 'Good for Jan and Bill.')
spicyMeatloaf = graph.addVertex(label, 'recipe', 'recipeId', 2005, 'name', 'Spicy Meatloaf', 'instructions', 'Preheat the oven to 375 degrees F. Cook bacon in a large skillet over medium heat until very crisp and fat has rendered, 8-10 minutes.', 'notes', ' ')
oystersRockefeller = graph.addVertex(label, 'recipe', 'recipeId', 2006, 'name', 'Oysters Rockefeller', 'instructions', 'Saute the shallots, celery, herbs, and seasonings in 3 tablespoons of the butter for 3 minutes. Add the watercress and let it wilt.', 'notes', ' ')
carrotSoup = graph.addVertex(label, 'recipe', 'recipeId', 2007, 'name', 'Carrot Soup', 'instructions', 'In a heavy-bottomed pot, melt the butter. When it starts to foam, add the onions and thyme and cook over medium-low heat until tender, about 10 minutes.', 'notes', 'Quick and easy.')
roastPorkLoin = graph.addVertex(label, 'recipe', 'recipeId', 2008, 'name', 'Roast Pork Loin', 'instructions', 'The day before, separate the meat from the ribs, stopping about 1 inch before the end of the bones. Season the pork liberally inside and out with salt and pepper and refrigerate overnight.', 'notes', 'Love this one!')
// ingredients vertices
beef = graph.addVertex(label, 'ingredient', 'ingredId', 3001, 'name', 'beef')
onion = graph.addVertex(label, 'ingredient', 'ingredId', 3002, 'name', 'onion')
mashedGarlic = graph.addVertex(label, 'ingredient', 'ingredId', 3003, 'name', 'mashed garlic')
butter = graph.addVertex(label, 'ingredient', 'ingredId', 3004, 'name', 'butter')
tomatoPaste = graph.addVertex(label, 'ingredient', 'ingredId', 3005, 'name', 'tomato paste')
eggplant = graph.addVertex(label, 'ingredient', 'ingredId', 3006, 'name', 'eggplant')
zucchini = graph.addVertex(label, 'ingredient', 'ingredId', 3007, 'name', 'zucchini')
oliveOil = graph.addVertex(label, 'ingredient', 'ingredId', 3008, 'name', 'olive oil')
yellowOnion = graph.addVertex(label, 'ingredient', 'ingredId', 3009, 'name', 'yellow onion')
greenBean = graph.addVertex(label, 'ingredient', 'ingredId', 3010, 'name', 'green beans')
tuna = graph.addVertex(label, 'ingredient', 'ingredId', 3011, 'name', 'tuna')
tomato = graph.addVertex(label, 'ingredient', 'ingredId', 3012, 'name', 'tomato')
hardBoiledEgg = graph.addVertex(label, 'ingredient', 'ingredId', 3013, 'name', 'hard-boiled egg')
eggNoodles = graph.addVertex(label, 'ingredient', 'ingredId', 3014, 'name', 'egg noodles')
mushroom = graph.addVertex(label, 'ingredient', 'ingredId', 3015, 'name', 'mushrooms')
bacon = graph.addVertex(label, 'ingredient', 'ingredId', 3016, 'name', 'bacon')
celery = graph.addVertex(label, 'ingredient', 'ingredId', 3017, 'name', 'celery')
greenBellPepper = graph.addVertex(label, 'ingredient', 'ingredId', 3018, 'name', 'green bell pepper')
groundBeef = graph.addVertex(label, 'ingredient', 'ingredId', 3019, 'name', 'ground beef')
porkSausage = graph.addVertex(label, 'ingredient', 'ingredId', 3020, 'name', 'pork sausage')
shallot = graph.addVertex(label, 'ingredient', 'ingredId', 3021, 'name', 'shallots')
chervil = graph.addVertex(label, 'ingredient', 'ingredId', 3022, 'name', 'chervil')
fennel = graph.addVertex(label, 'ingredient', 'ingredId', 3023, 'name', 'fennel')
parsley = graph.addVertex(label, 'ingredient', 'ingredId', 3024, 'name', 'parsley')
oyster = graph.addVertex(label, 'ingredient', 'ingredId', 3025, 'name', 'oyster')
pernod = graph.addVertex(label, 'ingredient', 'ingredId', 3026, 'name', 'Pernod')
thyme = graph.addVertex(label, 'ingredient', 'ingredId', 3027, 'name', 'thyme')
carrot = graph.addVertex(label, 'ingredient', 'ingredId', 3028, 'name', 'carrots')
chickenBroth = graph.addVertex(label, 'ingredient', 'ingredId', 3029, 'name', 'chicken broth')
porkLoin = graph.addVertex(label, 'ingredient', 'ingredId', 3030, 'name', 'pork loin')
redWine = graph.addVertex(label, 'ingredient', 'ingredId', 3031, 'name', 'red wine')
// meal vertices
meal1 = graph.addVertex(label, 'meal', 'mealId', 4001, 'type', 'lunch')
meal2 = graph.addVertex(label, 'meal', 'mealId', 4002, 'type', 'lunch')
meal3 = graph.addVertex(label, 'meal', 'mealId', 4003, 'type', 'lunch')
meal4 = graph.addVertex(label, 'meal', 'mealId', 4004, 'type', 'lunch')
meal5 = graph.addVertex(label, 'meal', 'mealId', 4005, 'type', 'breakfast')
meal6 = graph.addVertex(label, 'meal', 'mealId', 4006, 'type', 'snack')
meal7 = graph.addVertex(label, 'meal', 'mealId', 4007, 'type', 'dinner')
meal8 = graph.addVertex(label, 'meal', 'mealId', 4008, 'type', 'dinner')
// author-book edges
juliaChild.addEdge('authored', artOfFrenchCookingVolOne)
simoneBeck.addEdge('authored', artOfFrenchCookingVolOne)
louisetteBertholie.addEdge('authored', artOfFrenchCookingVolOne)
simoneBeck.addEdge('authored', simcasCuisine)
patriciaSimon.addEdge('authored', simcasCuisine)
juliaChild.addEdge('authored', frenchChefCookbook)
aliceWaters.addEdge('authored', artOfSimpleFood)
patriciaCurtan.addEdge('authored', artOfSimpleFood)
kelsieKerr.addEdge('authored', artOfSimpleFood)
fritzStreiff.addEdge('authored', artOfSimpleFood)
// author - recipe edges
juliaChild.addEdge('created', beefBourguignon, 'createDate', 1961-01-01)
juliaChild.addEdge('created', ratatouille, 'createDate', 1965-02-02)
juliaChild.addEdge('created', saladeNicoise, 'createDate', 1962-03-03)
emerilLagasse.addEdge('created', wildMushroomStroganoff, 'createDate', 2003-04-04)
emerilLagasse.addEdge('created', spicyMeatloaf, 'createDate', 2000-05-05)
aliceWaters.addEdge('created', carrotSoup, 'createDate', 1995-06-06)
aliceWaters.addEdge('created', roastPorkLoin, 'createDate', 1996-07-07)
jamesBeard.addEdge('created', oystersRockefeller, 'createDate', 1970-01-01)
// recipe - ingredient edges
beefBourguignon.addEdge('includedIn', beef, 'amount', '2 lbs')
beefBourguignon.addEdge('includedIn', onion, 'amount', '1 sliced')
beefBourguignon.addEdge('includedIn', mashedGarlic, 'amount', '2 cloves')
beefBourguignon.addEdge('includedIn', butter, 'amount', '3.5 Tbsp')
beefBourguignon.addEdge('includedIn', tomatoPaste, 'amount', '1 Tbsp')
ratatouille.addEdge('includedIn', eggplant, 'amount', '1 lb')
ratatouille.addEdge('includedIn', zucchini, 'amount', '1 lb')
ratatouille.addEdge('includedIn', mashedGarlic, 'amount', '2 cloves')
ratatouille.addEdge('includedIn', oliveOil, 'amount', '4-6 Tbsp')
ratatouille.addEdge('includedIn', yellowOnion, 'amount', '1 1/2 cups or 1/2 lb thinly sliced')
saladeNicoise.addEdge('includedIn', oliveOil, 'amount', '2-3 Tbsp')
saladeNicoise.addEdge('includedIn', greenBean, 'amount', '1 1/2 lbs blanched, trimmed')
saladeNicoise.addEdge('includedIn', tuna, 'amount', '8-10 ozs oil-packed, drained and flaked')
saladeNicoise.addEdge('includedIn', tomato, 'amount', '3 or 4 red, peeled, quartered, cored, and seasoned')
saladeNicoise.addEdge('includedIn', hardBoiledEgg, 'amount', '8 halved lengthwise')
wildMushroomStroganoff.addEdge('includedIn', eggNoodles, 'amount', '16 ozs wmyIde')
wildMushroomStroganoff.addEdge('includedIn', mushroom, 'amount', '2 lbs wild or exotic, cleaned, stemmed, and sliced')
wildMushroomStroganoff.addEdge('includedIn', yellowOnion, 'amount', '1 cup thinly sliced')
spicyMeatloaf.addEdge('includedIn', bacon, 'amount', '3 ozs diced')
spicyMeatloaf.addEdge('includedIn', onion, 'amount', '2 cups finely chopped')
spicyMeatloaf.addEdge('includedIn', celery, 'amount', '2 cups finely chopped')
spicyMeatloaf.addEdge('includedIn', greenBellPepper, 'amount', '1/4 cup finely chopped')
spicyMeatloaf.addEdge('includedIn', porkSausage, 'amount', '3/4 lbs hot')
spicyMeatloaf.addEdge('includedIn', groundBeef, 'amount', '1 1/2 lbs chuck')
oystersRockefeller.addEdge('includedIn', shallot, 'amount', '1/4 cup chopped')
oystersRockefeller.addEdge('includedIn', celery, 'amount', '1/4 cup chopped')
oystersRockefeller.addEdge('includedIn', chervil, 'amount', '1 tsp')
oystersRockefeller.addEdge('includedIn', fennel, 'amount', '1/3 cup chopped')
oystersRockefeller.addEdge('includedIn', parsley, 'amount', '1/3 cup chopped')
oystersRockefeller.addEdge('includedIn', oyster, 'amount', '2 dozen on the half shell')
oystersRockefeller.addEdge('includedIn', pernod, 'amount', '1/3 cup')
carrotSoup.addEdge('includedIn', butter, 'amount', '4 Tbsp')
carrotSoup.addEdge('includedIn', onion, 'amount', '2 medium sliced')
carrotSoup.addEdge('includedIn', thyme, 'amount', '1 sprig')
carrotSoup.addEdge('includedIn', carrot, 'amount', '2 1/2 lbs, peeled and sliced')
carrotSoup.addEdge('includedIn', chickenBroth, 'amount', '6 cups')
roastPorkLoin.addEdge('includedIn', porkLoin, 'amount', '1 bone-in, 4-rib')
roastPorkLoin.addEdge('includedIn', redWine, 'amount', '1/2 cup')
roastPorkLoin.addEdge('includedIn', chickenBroth, 'amount', '1 cup')
// book - recipe edges
beefBourguignon.addEdge('includedIn', artOfFrenchCookingVolOne)
saladeNicoise.addEdge('includedIn', artOfFrenchCookingVolOne)
carrotSoup.addEdge('includedIn', artOfSimpleFood)
// meal - recipe edges
beefBourguignon.addEdge('includedIn', meal1)
saladeNicoise.addEdge('includedIn', meal1)
carrotSoup.addEdge('includedIn', meal4)
roastPorkLoin.addEdge('includedIn', meal4)
// meal - book edges
meal7.addEdge('includedIn', artOfFrenchCookingVolOne)
meal8.addEdge('includedIn', artOfSimpleFood)
meal5.addEdge('includedIn', frenchChefCookbook)
g.V()
In Studio:Figure 2. Data for the Recipe Toy GraphThe g.V() command at the end of the script displays
all the vertices created.
In Gremlin
console:
// A series of returns for vertices and edges will mark the successful completion of the script
// Sample vertex
==>v[{~label=meal, type="dinner", mealId=4008}]
// Sample edge
==>e[{~label=includedIn, ~out_vertex={~label=meal, type="dinner", mealId=4008},
~in_vertex={~label=book, bookId=1004},
~local_id=5dec6ef7-0562-11e8-a4a1-4b3271ac7767}]
[{~label=meal, type="dinner", mealId=4008}-includedIn->{~label=book, bookId=1004}]
If a vertex count is run as either a transactional query or analytical query,
there is now a higher count of 61 vertices. Run the vertex count again:
g.V().count()
The DSE Graph Loader is the recommended
method for scripting data loading. Using graph.addVertex or
g.addV() are only practical for small toy graphs like
the recipe example.
Similarly, the edge count can be run, to discover the higher edge count of
67:
g.E().count()
QuickStart Exploring traversals
Explore graph data with query traversals.
Exploring the graph with graph traversals can lead to interesting conclusions. Here we'll
explore a number of traversals, to show off the power of Gremlin in creating simple queries.
Procedure
All queries can be profiled to see what the query path is and how the query performs.
In Studio:Clicking on the bars in the graph in Studio will show more detail about underlying
processes in the database.
In Gremlin
console:
==>Traversal Metrics
Step Count Traversers Time (ms) % Dur
=============================================================================================================
DsegGraphStep(vertex,[],(label = person & name ... 1 1 10.097 65.69
query-optimizer 1.848
\_condition=((label = person & name = Julia Child) & (true))
query-setup 0.065
\_isFitted=true
\_isSorted=false
\_isScan=false
index-query 1.645
\_indexType=Materialized
\_usesCache=false
\_statement=SELECT "personId" FROM "DSE_GRAPH_QUICKSTART"."person_p_byName" WHERE "name" = ? LIMIT ?; wit
h params (java.lang.String) Julia Child, (java.lang.Integer) 50000
\_options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Option
al.empty, pagingState=null, pageSize=-1, user=Optional.empty, waitForSchemaAgreement=true, asyn
c=true}
DsegPropertyLoadStep 1 1 5.274 34.31
>TOTAL - - 15.372 -
In all the following queries, to investigate what happens, and why some queries are more
efficient than others, try adding .profile() to any query will show you information
similar to the information above.
With several person vertices in the graph, a specific name must be given to
find a particular vertex. This traversal gets the stored vertex information for the vertex that
has the name of Julia Child. Note that the constraint that the vertex is
an author is also included in the has() clause. Graph queries will have
lower latency if the query is more specific, and the has() step is an more
tool for narrowing the search.
g.V().has('person', 'name', 'Julia Child')
Running the query in Studio will display the vertex id, label and all property values. In
Gremlin console, this query will only display the vertex id, and the
valueMap() step must be appended to get the property values.
In this next traversal, has() filters vertex properties by name =
Julia Child as seen above. The traversal step outE() discovers the
outgoing edges from that vertex with the authored label.
g.V().has('name','Julia Child').outE('authored')
In Studio, either the listing of the Raw JSON view edge
information:or the Graph view graph visualization where scrolling over a
vertex provides additional information.
Spark SQL can also be used to discover information for a set of vertices or edges that
match particular conditions. Here, all the edges with a createdate greater
than May 1, 1975 are returned. Note the lack of camel case column names in Spark SQL.
SELECT * FROM DSE_GRAPH_QUICKSTART_edges WHERE createdate > '1975-05-01';
In Studio:The data presented in Spark SQL is different than the data stored in the database tables
for graph. In Spark SQL tables, the source and destination vertices are listed for an edge,
along with the edge label and properties.
If instead, you want to query for the books that all people have written, the query must be
modified. The previous example retrieved edges, but not the adjacent book vertices. Add a
traversal step inV() to find all the vertices that connect to the outgoing
edges, then print the book titles of those vertices. Notice how the chained traversal steps go
from the vertices along outgoing edges to the adjacent vertices with
V().outE().inV(). The outgoing edges are given a particular filter value,
authored.
g.V().outE('authored').inV().values('name')
In Studio:
and a similar listing in Gremlin console.
Notice that the book titles are duplicated in the resulting list, because a listing is
returned for each author. If a book has three authors, three listings are returned. The
traversal step dedup() can eliminate the duplication.
In Studio: and a similar listing in Gremlin console.
The previous example and this example accomplish the same result. However, the number of
traversal steps and the type of traversal steps can affect performance. The traversal step
outE() should be only used if the edges are explicitly required. In this
example, the edges are traversed to get information about connected vertices, but the edge
information is not important to the query.
In Studio: and a similar listing in Gremlin console.
The traversal step out() retrieves the connected book vertices based on the
edge label authored without retrieving the edge information. In a larger
graph traversal, this subtle difference in the traversal can become a latency issue.
Additional traversal steps continue to fine-tune the results. Adding another chained
has traversal step finds only books authored by Julia Child published after
1967. This example also displays the use of the gt, or greater than
function.
In Studio: and a similar listing in Gremlin console.
When developing or testing, oftentimes checking the number of vertices with each vertex
label can confirm that data was read. To find the number of vertices by vertex label, use the
traversal step label() followed by the traversal step
groupCount(). The step groupCount() is useful for
aggregating results from a previous step. Although this query can be run in real-time, it is an
excellent example of a query that should be run in analytic (OLAP) mode. In Studio, under the
run arrow, select Execute using analytic engine (Spark) before running.
g.V().label().groupCount()
An alternative method for getting the group count with Spark SQL uses:
SELECT `~label` AS label, COUNT(*) AS label_count FROM DSE_GRAPH_QUICKSTART_vertices GROUP BY label;
QuickStart Writing and reading data
Writing and reading graph data.
Writing data from DSE Graph to a file is most easily accomplished with the
graph.io() command. The DSE
Graph Loader is the most appropriate tool for reading in data from files
or other sources.
Procedure
Write your data to an output file to save or exchange information. A Gryo file
is a binary format file that can reload data to DSE Graph. In this next command,
graph I/O writes the entire graph to a file. Other file formats can be written
by substituting gryo() with graphml() or
graphson().
graph.io(gryo()).writeGraph("/tmp/recipe.gryo")
Note:graph.io() is disabled in sandbox mode.
In Studio:
In Gremlin console:
==>null
To load a Gryo file, use the graphloader, after creating a
mapping script: