DSE Graph QuickStart
DSE Graph QuickStart using DataStax Studio or Gremlin console.
QuickStart Introduction
QuickStart Introduction
Graph databases are useful for discovering simple and complex relationships between objects. Relationships are fundamental to how objects interact with one another and their environment. Graph databases perfectly represent the relationships between objects.
- vertex
- A vertex is an object, such as a person, location, automobile, recipe, or anything else you can think of as nouns.
- edge
- An edge defines the relationship between two vertices. A person can create software, or an author can write a book. Typically an edge is equivalent to a verb.
- property
- A key-value pair that describes some attribute of either a vertex or an edge. A property key is used to describe the key in the key-value pair. All properties are global in DSE Graph, meaning that a property can be used for any vertices. For example, "name" can be used for all vertices in a graph.
Property graphs are typically quite large, although the nature of querying the graph varies depending on whether the graph has large numbers of vertices, edges, or both vertices and edges. To get started with graph database concepts, a toy graph is used for simplicity. The example used here explores the world of food.

Elements are labeled to distinguish the type of vertices and edges in a graph database using vertex labels and edge labels. A vertex labeled person holds information about an author or reviewer or someone who ate a meal. An edge between an person and a book is labeled authored. Specifying appropriate labels is an important step in graph data modeling.
Vertices and edges generally have properties. For instance, a person vertex can have properties name and gender. Edges can also have properties. A created edge can have a createDate property that identifies when the adjoining recipe vertex was created.
Information in a graph database is retrieved using graph traversals. Graph traversals walk a graph with a single or series of traversal steps from a defined starting point and filter each step until returning a result.
To retrieve information using graph traversals, you must first insert data. The steps listed in this section allow you to gain a rudimentary understanding of DSE Graph with a minimum amount of configuration and schema creation.
QuickStart Installation
Install DataStax Enterprise and DataStax Studio.
Procedure
- Install DataStax Enterprise.
- Start DataStax Enterprise with DSE Graph enabled.
-
Start either DataStax
Studio or Gremlin
console:
QuickStart Configuration
Configure DSE Graph to run QuickStart.
Procedure
-
Create a Studio notebook and configure a graph for the QuickStart. If you are
using Gremlin console, skip to this step.
-
Create a graph in Gremlin Console and configure the graph for the
QuickStart.
QuickStart Vertex and edge counting
Methods for counting vertices and edges in DSE Graph.
There are different methods for accomplishing vertex and edge counts in DSE Graph. Examples here will show how to use the Gremlin count() command either as a transactional or analytical query, and Spark SQL for analytical queries.
A transactional Gremlin query can be used to check the number of vertices that exist in the graph, and are useful for exploring small graphs. However, such a query scans the full graph, traversing every vertex, and should not be run on large graphs! If multiple DSE nodes are configured, this traversal step intensively walks all partitions on all nodes in the cluster that have graph data. This method is not appropriate for Production operations.
An analytical Gremlin query can be used to check the number of vertices that exist in any graph, large or small, and are much safer for Production operations. The queries will be written like transactional Gremlin queries, but executed with the analytic Spark engine.
Spark SQL provides another query method for counting vertices in transactional graph traversals. If the AlwaysOnSQL service is turned on, Studio uses the JDBC interface to pass queries to DSE Analytics. Two tables, graphName_vertices and graphName_edges, are automatically generated in the Spark database dse_graph for each graph, where graphName is replaced with the graph used for the Studio connection assigned to a Studio notebook. These tables can be queried with common Spark SQL commands directly in Studio, or can be explored with the dse spark-sql shell. To learn more about using Spark SQL to query, see the Using Spark SQL to query data documentation.
Procedure
QuickStart Simple example
Simple DSE Graph example.
Let's start with a simple example from the recipe data model. The data is composed of two vertices, one person who is an author (Julia Child) and one book (The Art of French Cooking, Vol. 1) with an edge between them to identify that Julia Child authored that book. Although we could make this graph without schema, and DSE Graph would make a best guess about the data types, we'll supply schema before inserting the graph data.
Next graph.addVertex is used to add data for a single vertex. Note the use of label to designate the vertex label. A g.addV statement could also be used, as shown in the alternate method.
Run the command and look at the results using the buttons to display the Raw JSON, Table, and Graph views
Procedure
QuickStart Key features
Key features of DSE Graph.
A vertex label person specifies the type of vertex, personId provides a user-defined vertex id to manage cluster storage of the vertex, and the property keys name and gender display the properties for a person. Creating vertex labels explains the id components.
Procedure
QuickStart Graph schema
Set graph schema.
Before adding more data to the graph, let's stop and talk about schema. Schema defines the possible properties and their data types for the graph. These properties are then used in the definitions of vertex labels and edge labels. The last critical step in schema creation is index creation. Indexes play an important role in making graph traversals efficient and fast. See creating schema and creating indexes for more information.
First, let's create schema for the property keys. In the next two cells, the first command clears the schema for the previously created vertices and edge. After the schema creation is completed, the next step is to enter data for those elements again in a longer script.
Procedure
QuickStart Indexing
Index graph schema.
Indexing is a complex and highly important subject.
Here, several types of indexes are created. Briefly, secondary and materialized
indexes are two types of indexes that use the DSE database built-in indexing. Search
indexes use DSE Search which is Solr-based. Only one search index per vertex label
is allowed, but multiple properties can be included. Property indexes allow
meta-properties to be indexed. Edge indexes allow properties on edges to be indexed.
Note that indexes are added with add()
to previously created vertex
labels.
Procedure
// ********
// VERTEX INDEX
// ********
// SYNTAX:
// index('index_name').
// [secondary() | materialized() | search()].
// by('propertykey_name').
// [ asText() | asString() ].
// add()
// ********
schema.vertexLabel('person').index('byName').materialized().by('name').add()
schema.vertexLabel('meal_item').index('byName').materialized().by('name').add()
schema.vertexLabel('ingredient').index('byName').materialized().by('name').add()
//schema.vertexLabel('recipe').index('byCuisine').materialized().by('cuisine').add()
//schema.vertexLabel('book').index('byName').materialized().by('name').add()
schema.vertexLabel('meal').index('byType').secondary().by('type').add()
schema.vertexLabel('recipe').index('search').search().
by('instructions').by('name').by('cuisine').add()
schema.vertexLabel('book').index('search').search().
by('name').by('publishYear').add()
schema.vertexLabel('location').index('search').search().
by('geoPoint').withError(0.000009,0.0).add()
schema.vertexLabel('store').index('search').search().by('name').add()
schema.vertexLabel('home').index('search').search().by('name').add()
schema.vertexLabel('fridgeSensor').index('search').search().
by('cityId').by('sensorId').by('name').add()
// ********
// EDGE INDEX
// ********
// SYNTAX:
// index('index_name').
// [outE('edgeLabel') | inE('edgeLabel') ].
// by('propertykey_name').
// add()
// ********
schema.vertexLabel('recipe').index('byStars').inE('reviewed').
by('stars').ifNotExists().add()
schema.vertexLabel('person').index('ratedByStars').outE('reviewed').
by('stars').ifNotExists().add()
schema.vertexLabel('person').index('ratedByDate').outE('reviewed').
by('year').ifNotExists().add()
schema.vertexLabel('person').index('ratedByComments').outE('reviewed').
by('comment').ifNotExists().add()
schema.vertexLabel('recipe').index('byPersonOrRecipe').bothE('created').
by('createDate').ifNotExists().add()
// ********
// PROPERTY INDEX using meta-property 'livedIn'
// ********
// SYNTAX:
// index('index_name').
// property('propertykey_name').
// by('meta-propertykey_name').
// add()
// ********
schema.vertexLabel('person').index('byStartYear').
property('country').by('startYear').add()
schema.vertexLabel('person').index('byEndYear').
property('country').by('endYear').add()
QuickStart Inspecting schema
Inspect graph schema.
The schema.describe()
query displays schema you can use to recreate
the schema entered. If you enter data without creating schema, you can use this
command verify the data types set for each property.
Procedure
QuickStart Modifying schema
Modify graph schema.
Schema can be modified after creation, using schema add() to add additional properties, vertex labels, edge labels, or indexes, as shown in the schema creation above. The drop() step can also be used to remove any element; see propertyKey, vertexLabel, and edgeLabel. The data type of a property, however, cannot be changed, without removing and recreating the property. While entering data without schema creation is useful when developing and learning, it is strongly recommended against for actual applications. As a reminder, Production mode disallows schema creation once data is loaded.
Procedure
QuickStart Add data
Adding data to a graph.
Now that schema is created, add more vertices and edges using the following script.
To explore more connections in the recipe data model, more vertices and edges are
input into the graph. A script, generateRecipe.groovy, is entered and then
executed by the remote Gremlin server. Note the first command,
g.V().drop().iterate()
; this command can be used to drop all
vertex and edge data from the graph before reading in new data. In Studio, be sure
to select the Graph view after running the script.
Procedure
QuickStart Exploring traversals
Explore graph data with query traversals.
Exploring the graph with graph traversals can lead to interesting conclusions. Here we'll explore a number of traversals, to show off the power of Gremlin in creating simple queries.
Procedure
QuickStart Writing and reading data
Writing and reading graph data.
Writing data from DSE Graph to a file is most easily accomplished with the
graph.io()
command. The DSE
Graph Loader is the most appropriate tool for reading in data from files
or other sources.
Procedure
QuickStart Listing graphs
How to list graphs.
Procedure
Increase your knowledge
Further increase your knowledge of DSE Graph.
Further adventures in traversing can be found in Creating queries using traversals. If you want to explore various loading options, check out DSE Graph Loader.
DataStax also hosts a DSE Graph self-paced course on DataStax Academy; register for a free account to access the course.