Run OLAP queries against the Northwind demo graph data.
The Northwind demo included with the DSE demos has a script for creating a graph of
the data for a fictional trading company.
In this task, you'll use the Gremlin console to create the Northwind graph, snapshot
part of the graph, and run a count operation on the subgraph using the
SparkGraphComputer.
Procedure
-
Start the Gremlin console using the dse gremlin-console
command:
-
Set the schema mode to
Development
.
To allow modifying the schema for the connected graph database, you must set
the mode to Development
each session. The default schema
mode for DSE Graph is Production
, which doesn't allow you
to modify the graph's schema.
gremlin> schema.config().option('graph.schema_mode').set('Development')
-
Create the Northwind graph:
gremlin> system.graph('northwind').create()
-
Alias the traversal to Northwind graph using the default OLTP traversal
source:
gremlin> :remote config alias g northwind.g
-
Load the Northwind demo script using the :load
command:
gremlin> :load /usr/share/dse/demos/graph/northwind/northwind.groovy
-
Run the script:
gremlin> loadNorthwind(graph)
-
Look at the schema of the
northwind
graph:
gremlin> schema.describe()
-
Alias the traversal to the Northwind analytics OLAP traversal source
a
. Alias g
to the OLAP traversal source
for one-off analytic queries:
gremlin> :remote config alias g northwind.a
==>g=northwind.a
-
Count the number of vertices using the OLAP traversal source:
gremlin> g.V().count()
==>3209
When you alias g
to the OLAP traversal source
database name.a
, DSE Analytics
is the workload back-end.
-
Store subgraphs into snapshots using graph.snapshot().
When you need to run multiple OLAP queries on a graph in one session, use
snapshots of the graph as the traversal source.
gremlin> employees = graph.snapshot().vertices('employee').create()
==>graphtraversalsource[hadoopgraph[persistedinputrdd->persistedoutputrdd], sparkgraphcomputer]
gremlin> categories = graph.snapshot().vertices('category').create()
==>graphtraversalsource[hadoopgraph[persistedinputrdd->persistedoutputrdd], sparkgraphcomputer]
The snapshot() method returns an OLAP traversal source
using the SparkGraphComputer.
-
Run an operation on the snapshot graphs.
Count the number of employee vertices in the snapshot graph:
gremlin> employees.V().count()
==> 9
Count the number of category vertices in the snapshot graph:
gremlin> categories.V().count()
==> 8