DSE Graph QuickStart

DSE Graph QuickStart using DataStax Studio or Gremlin console.

QuickStart Introduction

QuickStart Introduction

Graph databases are useful for discovering simple and complex relationships between objects. Relationships are fundamental to how objects interact with one another and their environment. Graph databases perfectly represent the relationships between objects.

Graph databases consist of three elements:
A vertex is an object, such as a person, location, automobile, recipe, or anything else you can think of as nouns.
An edge defines the relationship between two vertices. A person can create software, or an author can write a book. Typically an edge is equivalent to a verb.
A key-value pair that describes some attribute of either a vertex or an edge. A property key is used to describe the key in the key-value pair. All properties are global in DSE Graph, meaning that a property can be used for any vertices. For example, "name" can be used for all vertices in a graph.
Vertices, edges, and properties can have properties; for this reason, DSE Graph is classified as a property graph. The properties for elements are an important element of storing and querying information in a property graph.

Property graphs are typically quite large, although the nature of querying the graph varies depending on whether the graph has large numbers of vertices, edges, or both vertices and edges. To get started with graph database concepts, a toy graph is used for simplicity. The example used here explores the world of food.

Figure 1. Recipe Toy Graph

Elements are labeled to distinguish the type of vertices and edges in a graph database using vertex labels and edge labels. A vertex labeled person holds information about an author or reviewer or someone who ate a meal. An edge between an person and a book is labeled authored. Specifying appropriate labels is an important step in graph data modeling.

Vertices and edges generally have properties. For instance, a person vertex can have properties name and gender. Edges can also have properties. A created edge can have a createDate property that identifies when the adjoining recipe vertex was created.

Information in a graph database is retrieved using graph traversals. Graph traversals walk a graph with a single or series of traversal steps from a defined starting point and filter each step until returning a result.

To retrieve information using graph traversals, you must first insert data. The steps listed in this section allow you to gain a rudimentary understanding of DSE Graph with a minimum amount of configuration and schema creation.

QuickStart Installation

Install DataStax Enterprise and DataStax Studio.


  1. Install DataStax Enterprise.
  2. Start DataStax Enterprise with DSE Graph enabled.
  3. Start either DataStax Studio or Gremlin console:
    1. Install DataStax Studio and start Studio.
    2. Start the Gremlin Console.
      bin/dse gremlin-console
               (o o)
      -----oOOo-(3)-oOOo-----plugin activated: tinkerpop.tinkergraph
      plugin activated: tinkerpop.server
      plugin activated: tinkerpop.utilities
      ==>Connected - localhost/[4edf75f9-ed27-4add-a350-172abe37f701]
      ==>Set remote timeout to 2147483647ms
      ==>All scripts will now be sent to Gremlin Server - [localhost/]-[4edf75f9-ed27-4add-a350-172abe37f701] - type ':remote console' to return to local mode

      Gremlin console sends all commands typed at the prompt to the Gremlin Server that will process the commands. DSE Graph runs a Gremlin Server tinkerpop.server on each DSE node. Gremlin console automatically connects to the Gremlin Server.

      The Gremlin console runs in remote mode automatically, processing commands on the Gremlin server. The Gremlin console by default opens a session to run commands on the remote server. The Gremlin console can be switched to run commands locally using:
      :remote console
      All commands will need to be submitted remotely once this command is run. Using the command again will switch the context back to the Gremlin server.

QuickStart Configuration

Configure DSE Graph to run QuickStart.


  1. Create a Studio notebook and configure a graph for the QuickStart. If you are using Gremlin console, skip to this step.
    1. This tutorial exists as a Studio notebook, DSE Graph QuickStart, so that you do not have to create a notebook. However, in Studio, creating a notebook is simple. If running Studio on a DSE node, the default connection of localhost works, otherwise create a connection for the DSE cluster desired. Each notebook is connected to a particular graph. Multiple notebooks can be connected to the same graph, or multiple notebooks can be created to connect to different graphs.

      A connection in Studio defines the graph and assigns a graph traversal g for that graph. A graph traversal is the mechanism for visiting each vertex in a graph, based on the filters defined in the graph traversal. To query DSE Graph, the graph traversal g must be assigned to a particular graph; Studio manages this assignment with connections.

      A blank notebook opens with a single cell. DSE Graph runs a Gremlin Server tinkerpop.server on each DataStax Enterprise node. Studio automatically connects to the Gremlin Server, and if it doesn't exist, it creates a graph using the connection information. The graph is stored as one graph instance per DSE database keyspace. Once a graph exists, a graph traversal source g is configured that allows graph traversals to be executed to query the graph. A graph traversal is bound to a specific traversal source, which by default is the standard OLTP traversal engine. The graph commands can add vertices and edges to the database, or get other graph information. The g commands can query or add vertices and edges.

    2. Set the schema mode to Development and allow full scans.
      CAUTION: Development is a more lenient mode that allows schema to be created automatically when adding data, and also allows full scans that can inspect the data with broad graph traversals. Full scans over large graphs will have high read latency, and are not appropriate for production applications. For production, the schema mode should be set to Production to require schema prior to inserting data and disallow full scans.
  2. Create a graph in Gremlin Console and configure the graph for the QuickStart.
    1. Create a graph to hold the data. The system commands are used to run commands that affect graphs in DSE Graph.

      Once a graph exists, a graph traversal g is configured that will allow graph traversals to be executed. Graph traversals are used to query the graph data and return results. A graph traversal is bound to a specific traversal source which is the standard OLTP traversal engine.

    2. Configure a graph traversal g to use the default graph traversal setting, which is test.g. This step will also create an implicit graph object.
      :remote config alias g test.g

      The graph commands allow graphs to be written to file, add vertices, properties, or edges to the database, and set or get other graph configuration. The g commands create queries to obtain results, and can also add vertices, properties, or edges to the database.

    3. Set the schema mode to Development and allow full scans.
      CAUTION: Development is a more lenient mode that allows schema to be created automatically when adding data, and also allows full scans that can inspect the data with broad graph traversals. Full scans over large graphs will have high read latency, and are not appropriate for production applications. For production, the schema mode should be set to Production to require schema prior to inserting data and disallow full scans.
    4. When creating a new graph, to check what graphs already exist, use:

QuickStart Vertex and edge counting

Methods for counting vertices and edges in DSE Graph.

There are different methods for accomplishing vertex and edge counts in DSE Graph. Examples here will show how to use the Gremlin count() command either as a transactional or analytical query, and Spark SQL for analytical queries.

A transactional Gremlin query can be used to check the number of vertices that exist in the graph, and are useful for exploring small graphs. However, such a query scans the full graph, traversing every vertex, and should not be run on large graphs! If multiple DSE nodes are configured, this traversal step intensively walks all partitions on all nodes in the cluster that have graph data. This method is not appropriate for Production operations.

An analytical Gremlin query can be used to check the number of vertices that exist in any graph, large or small, and are much safer for Production operations. The queries will be written like transactional Gremlin queries, but executed with the analytic Spark engine.

Spark SQL provides another query method for counting vertices in transactional graph traversals. If the AlwaysOnSQL service is turned on, Studio uses the JDBC interface to pass queries to DSE Analytics. Two tables, graphName_vertices and graphName_edges, are automatically generated in the Spark database dse_graph for each graph, where graphName is replaced with the graph used for the Studio connection assigned to a Studio notebook. These tables can be queried with common Spark SQL commands directly in Studio, or can be explored with the dse spark-sql shell. To learn more about using Spark SQL to query, see the Using Spark SQL to query data documentation.


Transactional Gremlin count()
  • Use the traversal step count(); the current count will be zero, because no data exists yet. A graph traversal g is chained with V() to retrieve all vertices and count() to compute the number of vertices. Chaining executes sequential traversal steps in the most efficient order.
Analytical Gremlin count()
  • To use Gremlin console, configure the traversal to run an analytical query:
    :remote config alias g test.a
    where test.a denotes that the graph will be used for analytic purposes.
  • To use Studio, configure the Run option to "Execute using analytic engine (Spark)" before running the query.
  • Use the traversal step count(); the current count will be zero, because no data exists yet. A graph traversal g is chained with V() to retrieve all vertices and count() to compute the number of vertices. Chaining executes sequential traversal steps in the most efficient order.
Spark SQL count
  • Enable AlwaysOn SQL or start a Spark SQL Thrift server instance.
    To use Spark SQL in Studio, enable AlwaysOn SQL service in the dse.yaml file by setting the option to true and restart DSE:
    # AlwaysOn SQL options
        # If it's true, the node is enabled for AlwaysOn SQL. Only Analytics node
        # can be enabled as a AlwaysOn SQL node
        enabled: true
    In a Studio cell, select Spark SQL in the language menu in a cell and set the database to dse_graph.
    To use the Spark SQL shell, start the shell:
    dse spark-sql
    and navigate to the correct database:
    USE dse_graph;
  • Then, in either Studio or the Spark SQL shell, execute the Spark SQL query for finding the vertex count:
Edge counts
  • To do an edge count with Gremlin, replace V() with E():
  • To do an edge count with Spark SQL, replace the word vertices in the table name with edges:

QuickStart Simple example

Simple DSE Graph example.

Let's start with a simple example from the recipe data model. The data is composed of two vertices, one person who is an author (Julia Child) and one book (The Art of French Cooking, Vol. 1) with an edge between them to identify that Julia Child authored that book. Although we could make this graph without schema, and DSE Graph would make a best guess about the data types, we'll supply schema before inserting the graph data.

Next graph.addVertex is used to add data for a single vertex. Note the use of label to designate the vertex label. A g.addV statement could also be used, as shown in the alternate method.

Run the command and look at the results using the buttons to display the Raw JSON, Table, and Graph views


  1. Schema is defined for properties personId, name, and gender. Properties should be created first, before vertex labels. A vertex label person identifies a partitionKey personId using an user-defined vertex id with a single partitionKey; personId is an integer for simplicity in this example. The schema to add the partitionKey and properties are executed with two statements, but could be executed as a single chained statement.

    The user-defined vertex id is used to partition the graph data amongst the cluster's nodes (more information). User-defined vertex (UDV) ids are strongly recommended, although auto-generated vertex ids are also available, but deprecated in DSE 6.0, with warnings logged when using auto-generated vertex ids. [(add a link here)](link info)

    As you will see in the schema for a book vertex label, a property key can be reused for different types of information. While properties are “global” in the sense that they can be used with multiple vertex labels, it is important to understand that when specifying a property in a graph traversal, it is always used in conjunction with a vertex label.

  2. First, insert a vertex for Julia Child using a graph.addVertex() command. The vertex label is person and two property key-value pairs are created for name and gender. Note that a label designates the key for a key-value pair that sets the vertex label.
    juliaChild = graph.addVertex(label,'person', 'personId', 1, 'name','Julia Child', 'gender','F')
    Note that there is an alternative method of inserting the vertex with a graph traversal g.addV:
    juliaChild = g.addV('person').property('personId', 1).property('name', 'Julia Child').property('gender', 'F')
    Performance tests show that the graph.addVertex() is faster, but the g.addV can be used in applications using DSE Drivers.
    The Studio result:
    Tip: In Studio, the result can be displayed using different views: Raw JSON, Table, or Graph. Explore the options.
    The Gremlin console result:
    ==>v[{~label=person, personId=1}]
  3. Create the schema for a vertex label book that has an user-defined vertex id single partitionKey bookId and includes the properties name, publishYear, and ISBN.
  4. Insert a book into the graph:
    artOfFrenchCookingVolOne = graph.addVertex(label, 'book', 'bookId', 1001, 'name', 'The Art of French Cooking, Vol. 1', 'year', 1961)
    or optionally, the traversal query:
    artOfFrenchCookingVolOne = g.addV('book').property('bookId', 1001).property('name','The Art of French Cooking, Vol. 1').property('publishYear', 1961)
    The Studio result:

    As with the author vertex, you can see all the information about the book vertex created. In Graph view, use the Settings button (the gear) to change the display label for author by entering Chef {{name}}. Change the book display label with {{label}}:{{name}}. Change the book display label with {{{name}}}. To set graph display names more generally, look for “Configure Graph Display Names” under the three bars in the upper lefthand corner of Studio.

    The Gremlin console result:
    ==>v[{~label=book, bookId=1001}]
  5. Add schema for the edge between the two vertices:
    schema.edgeLabel('authored').connection('person', 'book').add()
  6. The first query uses a variable juliaChild to hold the person vertex information, while the second query uses the variable artOfFrenchCookingVolOne to hold the book vertex information. The third query uses a graph traversal g.V(firstVertex).addE(edgeLabel).to(secondVertex)to create the edge between the author and book vertices.
    juliaChild = g.V().has('person', 'personId', 1).next()
    artOfFrenchCookingVolOne = g.V().has('book', 'bookId', 1001).next()
    or the graph alternative:
    juliaChild.addEdge('authored', artOfFrenchCookingVolOne)
    Use Graph view in Studio to see the relationship. Scroll over elements to display additional information.
    The Gremlin console result:
    ==>e[{~label=authored, ~out_vertex={~label=person, personId=1}, ~in_vertex={~label=book, bookId=1001}, 
    ~local_id=5deac140-0562-11e8-a4a1-4b3271ac7767}][{~label=person, personId=1}-authored->{~label=book, bookId=1001}]
  7. Ensure that the data inserted for the author is correct by checking with a has() step using the vertex label person and the property name = Julia Child. This graph traversal is a basic starting point for more complex traversals, because it narrows the search of the graph with specific information.
    g.V().has('person', 'name', 'Julia Child')
    In Studio, use the Table view to look at the results, as it is much more readable than the Raw JSON view.

    The vertex information is displayed for the person vertex for Julia Child. Note the id consists of the label and the user-defined vertex id personId.

    The Gremlin console result:
    ==>v[{~label=person, personId=1}]
  8. Another useful traversal is valueMap(), which prints the key-value listing of each property value for specified vertices.
    CAUTION: Using valueMap() without specifying properties can result in slow query latencies, if a large number of property keys exist for the queried vertex or edge. Specific properties can be specified, such as valueMap('name').
  9. Although Spark SQL is used more for analytical queries, simple queries similar to Gremlin can be made, such as querying information about vertices. A query can look for specific columns for a specific vertex label, in this case, a person with the name Julia Child. Notice the use of backticks to escape the tilde in the column name ~label and name.
    SELECT personid,name,gender FROM DSE_GRAPH_QUICKSTART_vertices WHERE `~label` = 'person' AND `name` = 'Julia Child';

QuickStart Key features

Key features of DSE Graph.

A vertex label person specifies the type of vertex, personId provides a user-defined vertex id to manage cluster storage of the vertex, and the property keys name and gender display the properties for a person. Creating vertex labels explains the id components.


  1. A useful traversal is valueMap() which prints the key-value listing of each property value for specified vertices.
    CAUTION: Using valueMap() without specifying properties can result in slow query latencies, if a large number of property keys exist for the queried vertex or edge. Specific properties can be specified, such as valueMap('name').
  2. If only the value of a particular property key is desired, the values() traversal step can be used. To get the name of all vertices, use:
  3. Edge information may also be retrieved. The next command filters all edges to find those with an edge label authored.

    The Raw JSON view of the edge information displays details about the incoming and outgoing vertices as well as edge parameters id, label, and type.

    In Gremlin console:
    ==>e[{~label=authored, ~out_vertex={~label=person, personId=1}, ~in_vertex={~label=book, bookId=1001}, 
    ~local_id=5deac140-0562-11e8-a4a1-4b3271ac7767}][{~label=person, personId=1}-authored->{~label=book, bookId=1001}]
  4. Spark SQL can also be used to find information about edges. Notice that the Spark-generated tables display different information than the Gremlin graph query. The traversal step count() is useful for counting both the number of vertices and the number of edges. To count edges, use E() rather than V(). You should have one edge. The same cautions apply about real-time transactional uses in Production - Spark SQL count or OLAP execution, both analytical actions, will be a better choice!

QuickStart Graph schema

Set graph schema.

Before adding more data to the graph, let's stop and talk about schema. Schema defines the possible properties and their data types for the graph. These properties are then used in the definitions of vertex labels and edge labels. The last critical step in schema creation is index creation. Indexes play an important role in making graph traversals efficient and fast. See creating schema and creating indexes for more information.

First, let's create schema for the property keys. In the next two cells, the first command clears the schema for the previously created vertices and edge. After the schema creation is completed, the next step is to enter data for those elements again in a longer script.

Note: DSE Graph has two schema modes, Production and Development. In Production mode, all schema must be identified before data is entered. In Development mode, schema can be created or modified after data is entered.


  1. Clear the schema:
  2. To keep the Spark SQL data synchronized with the graph, drop the Spark SQL tables. The tables will be automatically rebuilt, so that the data will align with the graph schema and data entered later.
Property Key schema
  1. Create the new property key schema:
    // ********
    // ********
    // SYNTAX:
    // propertyKey('name').
    //    type().
    //    [ single() | multiple() ].
    //    [ ttl ].
    //    [ properties(metadata_property) ].
    //    [ ifNotExists() ].
    //    [ create() | add() | describe() | exists() ]
    // ********
    schema.propertyKey('since').Int().single().create() // meta-property
    schema.propertyKey('startYear').Int().multiple().create()   // meta-property
    schema.propertyKey('endYear').Int().multiple().create()   // meta-property

    Each property must be defined with a data type. DSE Graph data types are aligned with the DSE database data types. By default, properties have single cardinality, but can be defined with multiple cardinality. Multiple cardinality allows more than one value to be assigned to a property.

    In addition, properties can have their own properties, or meta-properties. Meta-properties can only be nested one deep, and are useful for keying information to an individual property. Notice that property keys can be created with an additional method ifNotExists(). This method prevents overwriting a definition that can already exist.

Vertex label schema
  1. After property keys are created, vertex labels can be defined.
    // ********
    // ********
    // SYNTAX:
    // schema.vertexLabel('vertexLabel').
    //    [ partitionKey(propertyKey, [ partitionKey(propertyKey) ]) ].
    //    [ clusteringKey(propertyKey) ].
    //    [ ttl ].
    //    [ properties(property, property) ].
    //    [ index ].
    //    [ partition() ].
    //    [ cache() ].
    //    [ ifNotExists() ].
    //    [ create() | add() | describe() | exists() ]
    // ********
    schema.vertexLabel('meal_item').properties('name','servAmt', 'macro', 'calories').add()
    schema.vertexLabel('location').properties('name', 'geoPoint').add()
    schema.vertexLabel('recipe').properties('name','cuisine', 'instructions','notes').add()
    schema.vertexLabel('meal').partitionKey('type', 'mealId').create()
    schema.vertexLabel('fridgeSensor').partitionKey('stateId', 'cityId').clusteringKey('sensorId').create()

    The schema for vertex labels defines the label type, and optionally defines the properties associated with the vertex label. There are two different methods for defining the association of the properties with vertex labels, either during creation, or by adding them after vertex label addition. The ifNotExists() method can be used for any schema creation.

    Vertex ids should be user-defined (UDV) ids, as auto-generated vertex ids are deprecated in DSE 6.0. UDV ids are explained in further detail in the documentation, but note that partition keys and clustering keys may be defined.

    DSE Graph limits the number of vertex labels to 200 per graph.

Edge label schema
  1. After property keys are created, edge labels can be defined.
    // ********
    // ********
    // SYNTAX:
    //    [ single() | multiple() ].
    //    [ connection( outVertex, inVertex) ].
    //    [ ttl ].
    //    [ properties(property[, property]) ].
    //    [ ifNotExists() ].
    //    [ create() | add() | describe() | exists() ]
    // ********
    schema.edgeLabel('ate').connection('person', 'meal').add()
    schema.edgeLabel('created').connection('person', 'recipe').add()
    schema.edgeLabel('authored').connection('person', 'book').add()

    The schema for edge labels defines the label type, and defines the two vertex labels that are connected by the edge label with connection(). The reviewed edge label defines edges between adjacent vertices with the outgoing vertex label person and the incoming vertex label recipe. By default, edges have multiple cardinality, but can be defined with single cardinality. Multiple cardinality allows more than one edge with differing property values but the same edge label to be assigned.

QuickStart Indexing

Index graph schema.

Indexing is a complex and highly important subject. Here, several types of indexes are created. Briefly, secondary and materialized indexes are two types of indexes that use the DSE database built-in indexing. Search indexes use DSE Search which is Solr-based. Only one search index per vertex label is allowed, but multiple properties can be included. Property indexes allow meta-properties to be indexed. Edge indexes allow properties on edges to be indexed. Note that indexes are added with add() to previously created vertex labels.


Create the index schema:
// ********
// ********
// index('index_name').
//    [secondary() | materialized() | search()].
//    by('propertykey_name').
//    [ asText() | asString() ].
//    add()
// ********




// ********
// ********
// index('index_name').
//    [outE('edgeLabel') | inE('edgeLabel') ].
//    by('propertykey_name').
//    add()
// ********


// ********
// PROPERTY INDEX using meta-property 'livedIn'
// ********
// index('index_name').
//    property('propertykey_name').
//    by('meta-propertykey_name').
//    add()
// ********


QuickStart Inspecting schema

Inspect graph schema.

The schema.describe() query displays schema you can use to recreate the schema entered. If you enter data without creating schema, you can use this command verify the data types set for each property.


  1. Examine the schema:
    In Studio, a portion of the output:

    The schema.describe() query displays schema you can use to recreate the schema entered. If you enter data without creating schema, you can use this command verify the data types set for each property. While entering data without schema creation is handy while developing and learning, it is strongly recommended against for actual applications. As a reminder, Production mode disallows schema creation once data is loaded.

  2. Some groovy steps are useful in the Gremlin query to find specific schema descriptions. For instance, to find only the schema for vertex labels and their indexes, use the following command:
    In Studio:
    In Gremlin console:
    ==>schema.vertexLabel("recipe").partitionKey("recipeId").properties("name", "cuisine", "instructions", "notes").create()
    ==>schema.vertexLabel("store").partitionKey("storeId").properties("name", "address").create()
    ==>schema.vertexLabel("meal_item").partitionKey("itemId").properties("name", "servAmt", "macro", "calories").create()
    ==>schema.vertexLabel("fridgeSensor").partitionKey("stateId", "cityId").clusteringKey("sensorId").properties("name").create()
    ==>schema.vertexLabel("home").partitionKey("homeId").properties("name", "address").create()
    ==>schema.vertexLabel("person").partitionKey("personId").properties("name", "nickname", "gender", "calGoal", "macroGoal", "country").create()
    ==>schema.vertexLabel("book").partitionKey("bookId").properties("name", "publishYear", "ISBN", "bookDiscount").create()
    ==>schema.vertexLabel("location").partitionKey("locId").properties("name", "geoPoint").create()
    ==>schema.vertexLabel("location").index("search").search().by("geoPoint").withError(9.0E-6, 0.0).add()
    ==>schema.vertexLabel("meal").partitionKey("type", "mealId").create()

    Additional steps can split the output per newline and grep for a string as shown for index. The Gremlin variant used here is based on Apache Groovy, so any Groovy commands can be used to manipulate graph traversals. Apache Groovy is a language that smoothly integrates with Java to provide scripting capabilities.

QuickStart Modifying schema

Modify graph schema.

Schema can be modified after creation, using schema add() to add additional properties, vertex labels, edge labels, or indexes, as shown in the schema creation above. The drop() step can also be used to remove any element; see propertyKey, vertexLabel, and edgeLabel. The data type of a property, however, cannot be changed, without removing and recreating the property. While entering data without schema creation is useful when developing and learning, it is strongly recommended against for actual applications. As a reminder, Production mode disallows schema creation once data is loaded.


  1. Create a property to drop in the next step:
    In Studio:
  2. Drop the property:
    In Studio:

QuickStart Add data

Adding data to a graph.

Now that schema is created, add more vertices and edges using the following script. To explore more connections in the recipe data model, more vertices and edges are input into the graph. A script, generateRecipe.groovy, is entered and then executed by the remote Gremlin server. Note the first command, g.V().drop().iterate(); this command can be used to drop all vertex and edge data from the graph before reading in new data. In Studio, be sure to select the Graph view after running the script.


Adding more data
  1. Run generateRecipe.groovy in either Studio or the Gremlin console:
    If running in Gremlin console, use the following command to load:
    :load /tmp/generateRecipe.groovy
    replacing "/tmp" with the directory where you write the script. In Studio, run the script within a cell.
    // Generates all Recipe Toy Graph vertices and edges except Reviews
    // Add all vertices and edges for Recipe
    // author vertices
    juliaChild = graph.addVertex(label, 'person', 'personId', 1, 'name','Julia Child', 'gender', 'F')
    simoneBeck = graph.addVertex(label, 'person', 'personId', 2, 'name', 'Simone Beck', 'gender', 'F')
    louisetteBertholie = graph.addVertex(label, 'person', 'personId', 3, 'name', 'Louisette Bertholie', 'gender', 'F')
    patriciaSimon = graph.addVertex(label, 'person', 'personId', 4, 'name', 'Patricia Simon', 'gender', 'F')
    aliceWaters = graph.addVertex(label, 'person', 'personId', 5, 'name', 'Alice Waters', 'gender', 'F')
    patriciaCurtan = graph.addVertex(label, 'person', 'personId', 6, 'name', 'Patricia Curtan', 'gender', 'F')
    kelsieKerr = graph.addVertex(label, 'person', 'personId', 7, 'name', 'Kelsie Kerr', 'gender', 'F')
    fritzStreiff = graph.addVertex(label, 'person', 'personId', 8, 'name', 'Fritz Streiff', 'gender', 'M')
    emerilLagasse = graph.addVertex(label, 'person', 'personId', 9, 'name', 'Emeril Lagasse', 'gender', 'M')
    jamesBeard = graph.addVertex(label, 'person', 'personId', 10, 'name', 'James Beard', 'gender', 'M')
    // book vertices
    artOfFrenchCookingVolOne = graph.addVertex(label, 'book', 'bookId', 1001, 'name', 'The Art of French Cooking, Vol. 1', 'publishYear', 1961)
    simcasCuisine = graph.addVertex(label, 'book', 'bookId', 1002, 'name', "Simca's Cuisine: 100 Classic French Recipes for Every Occasion", 'publishYear', 1972, 'ISBN', '0-394-40152-2')
    frenchChefCookbook = graph.addVertex(label, 'book', 'bookId', 1003, 'name','The French Chef Cookbook', 'publishYear', 1968, 'ISBN', '0-394-40135-2')
    artOfSimpleFood = graph.addVertex(label, 'book', 'bookId', 1004, 'name', 'The Art of Simple Food: Notes, Lessons, and Recipes from a Delicious Revolution', 'publishYear', 2007, 'ISBN', '0-307-33679-4')
    // recipe vertices
    beefBourguignon = graph.addVertex(label, 'recipe', 'recipeId', 2001, 'name', 'Beef Bourguignon', 'instructions', 'Braise the beef. Saute the onions and carrots. Add wine and cook in a dutch oven at 425 degrees for 1 hour.', 'notes', 'Takes a long time to make.')
    ratatouille = graph.addVertex(label, 'recipe', 'recipeId', 2002, 'name', 'Rataouille', 'instructions', 'Peel and cut the egglant. Make sure you cut eggplant into lengthwise slices that are about 1-inch wmyIde, 3-inches long, and 3/8-inch thick', 'notes', "I've made this 13 times.")
    saladeNicoise = graph.addVertex(label, 'recipe', 'recipeId', 2003, 'name', 'Salade Nicoise', 'instructions', 'Take a salad bowl or platter and line it with lettuce leaves, shortly before serving. Drizzle some olive oil on the leaves and dust them with salt.', 'notes', '')
    wildMushroomStroganoff = graph.addVertex(label, 'recipe', 'recipeId', 2004, 'name', 'Wild Mushroom Stroganoff', 'instructions', 'Cook the egg noodles according to the package directions and keep warm. Heat 1 1/2 tablespoons of the oliveoil in a large saute pan over medium-high heat.', 'notes', 'Good for Jan and Bill.')
    spicyMeatloaf = graph.addVertex(label, 'recipe', 'recipeId', 2005, 'name', 'Spicy Meatloaf', 'instructions', 'Preheat the oven to 375 degrees F. Cook bacon in a large skillet over medium heat until very crisp and fat has rendered, 8-10 minutes.', 'notes', ' ')
    oystersRockefeller = graph.addVertex(label, 'recipe', 'recipeId', 2006, 'name', 'Oysters Rockefeller', 'instructions', 'Saute the shallots, celery, herbs, and seasonings in 3 tablespoons of the butter for 3 minutes. Add the watercress and let it wilt.', 'notes', ' ')
    carrotSoup = graph.addVertex(label, 'recipe', 'recipeId', 2007, 'name', 'Carrot Soup', 'instructions', 'In a heavy-bottomed pot, melt the butter. When it starts to foam, add the onions and thyme and cook over medium-low heat until tender, about 10 minutes.', 'notes', 'Quick and easy.')
    roastPorkLoin = graph.addVertex(label, 'recipe', 'recipeId', 2008, 'name', 'Roast Pork Loin', 'instructions', 'The day before, separate the meat from the ribs, stopping about 1 inch before the end of the bones. Season the pork liberally inside and out with salt and pepper and refrigerate overnight.', 'notes', 'Love this one!')
    // ingredients vertices
    beef = graph.addVertex(label, 'ingredient', 'ingredId', 3001, 'name', 'beef')
    onion = graph.addVertex(label, 'ingredient', 'ingredId', 3002, 'name', 'onion')
    mashedGarlic = graph.addVertex(label, 'ingredient', 'ingredId', 3003, 'name', 'mashed garlic')
    butter = graph.addVertex(label, 'ingredient', 'ingredId', 3004, 'name', 'butter')
    tomatoPaste = graph.addVertex(label, 'ingredient', 'ingredId', 3005, 'name', 'tomato paste')
    eggplant = graph.addVertex(label, 'ingredient', 'ingredId', 3006, 'name', 'eggplant')
    zucchini = graph.addVertex(label, 'ingredient', 'ingredId', 3007, 'name', 'zucchini')
    oliveOil = graph.addVertex(label, 'ingredient', 'ingredId', 3008, 'name', 'olive oil')
    yellowOnion = graph.addVertex(label, 'ingredient', 'ingredId', 3009, 'name', 'yellow onion')
    greenBean = graph.addVertex(label, 'ingredient', 'ingredId', 3010, 'name', 'green beans')
    tuna = graph.addVertex(label, 'ingredient', 'ingredId', 3011, 'name', 'tuna')
    tomato = graph.addVertex(label, 'ingredient', 'ingredId', 3012, 'name', 'tomato')
    hardBoiledEgg = graph.addVertex(label, 'ingredient', 'ingredId', 3013, 'name', 'hard-boiled egg')
    eggNoodles = graph.addVertex(label, 'ingredient', 'ingredId', 3014, 'name', 'egg noodles')
    mushroom = graph.addVertex(label, 'ingredient', 'ingredId', 3015, 'name', 'mushrooms')
    bacon = graph.addVertex(label, 'ingredient', 'ingredId', 3016, 'name', 'bacon')
    celery = graph.addVertex(label, 'ingredient', 'ingredId', 3017, 'name', 'celery')
    greenBellPepper = graph.addVertex(label, 'ingredient', 'ingredId', 3018, 'name', 'green bell pepper')
    groundBeef = graph.addVertex(label, 'ingredient', 'ingredId', 3019, 'name', 'ground beef')
    porkSausage = graph.addVertex(label, 'ingredient', 'ingredId', 3020, 'name', 'pork sausage')
    shallot = graph.addVertex(label, 'ingredient', 'ingredId', 3021, 'name', 'shallots')
    chervil = graph.addVertex(label, 'ingredient', 'ingredId', 3022, 'name', 'chervil')
    fennel = graph.addVertex(label, 'ingredient', 'ingredId', 3023, 'name', 'fennel')
    parsley = graph.addVertex(label, 'ingredient', 'ingredId', 3024, 'name', 'parsley')
    oyster = graph.addVertex(label, 'ingredient', 'ingredId', 3025, 'name', 'oyster')
    pernod = graph.addVertex(label, 'ingredient', 'ingredId', 3026, 'name', 'Pernod')
    thyme = graph.addVertex(label, 'ingredient', 'ingredId', 3027, 'name', 'thyme')
    carrot = graph.addVertex(label, 'ingredient', 'ingredId', 3028, 'name', 'carrots')
    chickenBroth = graph.addVertex(label, 'ingredient', 'ingredId', 3029, 'name', 'chicken broth')
    porkLoin = graph.addVertex(label, 'ingredient', 'ingredId', 3030, 'name', 'pork loin')
    redWine = graph.addVertex(label, 'ingredient', 'ingredId', 3031, 'name', 'red wine')
    // meal vertices
    meal1 = graph.addVertex(label, 'meal', 'mealId', 4001, 'type', 'lunch')
    meal2 = graph.addVertex(label, 'meal', 'mealId', 4002, 'type', 'lunch')
    meal3 = graph.addVertex(label, 'meal', 'mealId', 4003, 'type', 'lunch')
    meal4 = graph.addVertex(label, 'meal', 'mealId', 4004, 'type', 'lunch')
    meal5 = graph.addVertex(label, 'meal', 'mealId', 4005, 'type', 'breakfast')
    meal6 = graph.addVertex(label, 'meal', 'mealId', 4006, 'type', 'snack')
    meal7 = graph.addVertex(label, 'meal', 'mealId', 4007, 'type', 'dinner')
    meal8 = graph.addVertex(label, 'meal', 'mealId', 4008, 'type', 'dinner')
    // author-book edges
    juliaChild.addEdge('authored', artOfFrenchCookingVolOne)
    simoneBeck.addEdge('authored', artOfFrenchCookingVolOne)
    louisetteBertholie.addEdge('authored', artOfFrenchCookingVolOne)
    simoneBeck.addEdge('authored', simcasCuisine)
    patriciaSimon.addEdge('authored', simcasCuisine)
    juliaChild.addEdge('authored', frenchChefCookbook)
    aliceWaters.addEdge('authored', artOfSimpleFood)
    patriciaCurtan.addEdge('authored', artOfSimpleFood)
    kelsieKerr.addEdge('authored', artOfSimpleFood)
    fritzStreiff.addEdge('authored', artOfSimpleFood)
    // author - recipe edges
    juliaChild.addEdge('created', beefBourguignon, 'createDate', 1961-01-01)
    juliaChild.addEdge('created', ratatouille, 'createDate', 1965-02-02)
    juliaChild.addEdge('created', saladeNicoise, 'createDate', 1962-03-03)
    emerilLagasse.addEdge('created', wildMushroomStroganoff, 'createDate', 2003-04-04)
    emerilLagasse.addEdge('created', spicyMeatloaf, 'createDate', 2000-05-05)
    aliceWaters.addEdge('created', carrotSoup, 'createDate', 1995-06-06)
    aliceWaters.addEdge('created', roastPorkLoin, 'createDate', 1996-07-07)
    jamesBeard.addEdge('created', oystersRockefeller, 'createDate', 1970-01-01)
    // recipe - ingredient edges
    beefBourguignon.addEdge('includedIn', beef, 'amount', '2 lbs')
    beefBourguignon.addEdge('includedIn', onion, 'amount', '1 sliced')
    beefBourguignon.addEdge('includedIn', mashedGarlic, 'amount', '2 cloves')
    beefBourguignon.addEdge('includedIn', butter, 'amount', '3.5 Tbsp')
    beefBourguignon.addEdge('includedIn', tomatoPaste, 'amount', '1 Tbsp')
    ratatouille.addEdge('includedIn', eggplant, 'amount', '1 lb')
    ratatouille.addEdge('includedIn', zucchini, 'amount', '1 lb')
    ratatouille.addEdge('includedIn', mashedGarlic, 'amount', '2 cloves')
    ratatouille.addEdge('includedIn', oliveOil, 'amount', '4-6 Tbsp')
    ratatouille.addEdge('includedIn', yellowOnion, 'amount', '1 1/2 cups or 1/2 lb thinly sliced')
    saladeNicoise.addEdge('includedIn', oliveOil, 'amount', '2-3 Tbsp')
    saladeNicoise.addEdge('includedIn', greenBean, 'amount', '1 1/2 lbs blanched, trimmed')
    saladeNicoise.addEdge('includedIn', tuna, 'amount', '8-10 ozs oil-packed, drained and flaked')
    saladeNicoise.addEdge('includedIn', tomato, 'amount', '3 or 4 red, peeled, quartered, cored, and seasoned')
    saladeNicoise.addEdge('includedIn', hardBoiledEgg, 'amount', '8 halved lengthwise')
    wildMushroomStroganoff.addEdge('includedIn', eggNoodles, 'amount', '16 ozs wmyIde')
    wildMushroomStroganoff.addEdge('includedIn', mushroom, 'amount', '2 lbs wild or exotic, cleaned, stemmed, and sliced')
    wildMushroomStroganoff.addEdge('includedIn', yellowOnion, 'amount', '1 cup thinly sliced')
    spicyMeatloaf.addEdge('includedIn', bacon, 'amount', '3 ozs diced')
    spicyMeatloaf.addEdge('includedIn', onion, 'amount', '2 cups finely chopped')
    spicyMeatloaf.addEdge('includedIn', celery, 'amount', '2 cups finely chopped')
    spicyMeatloaf.addEdge('includedIn', greenBellPepper, 'amount', '1/4 cup finely chopped')
    spicyMeatloaf.addEdge('includedIn', porkSausage, 'amount', '3/4 lbs hot')
    spicyMeatloaf.addEdge('includedIn', groundBeef, 'amount', '1 1/2 lbs chuck')
    oystersRockefeller.addEdge('includedIn', shallot, 'amount', '1/4 cup chopped')
    oystersRockefeller.addEdge('includedIn', celery, 'amount', '1/4 cup chopped')
    oystersRockefeller.addEdge('includedIn', chervil, 'amount', '1 tsp')
    oystersRockefeller.addEdge('includedIn', fennel, 'amount', '1/3 cup chopped')
    oystersRockefeller.addEdge('includedIn', parsley, 'amount', '1/3 cup chopped')
    oystersRockefeller.addEdge('includedIn', oyster, 'amount', '2 dozen on the half shell')
    oystersRockefeller.addEdge('includedIn', pernod, 'amount', '1/3 cup')
    carrotSoup.addEdge('includedIn', butter, 'amount', '4 Tbsp')
    carrotSoup.addEdge('includedIn', onion, 'amount', '2 medium sliced')
    carrotSoup.addEdge('includedIn', thyme, 'amount', '1 sprig')
    carrotSoup.addEdge('includedIn', carrot, 'amount', '2 1/2 lbs, peeled and sliced')
    carrotSoup.addEdge('includedIn', chickenBroth, 'amount', '6 cups')
    roastPorkLoin.addEdge('includedIn', porkLoin, 'amount', '1 bone-in, 4-rib')
    roastPorkLoin.addEdge('includedIn', redWine, 'amount', '1/2 cup')
    roastPorkLoin.addEdge('includedIn', chickenBroth, 'amount', '1 cup')
    // book - recipe edges
    beefBourguignon.addEdge('includedIn', artOfFrenchCookingVolOne)
    saladeNicoise.addEdge('includedIn', artOfFrenchCookingVolOne)
    carrotSoup.addEdge('includedIn', artOfSimpleFood)
    // meal - recipe edges
    beefBourguignon.addEdge('includedIn', meal1)
    saladeNicoise.addEdge('includedIn', meal1)
    carrotSoup.addEdge('includedIn', meal4)
    roastPorkLoin.addEdge('includedIn', meal4)
    // meal - book edges
    meal7.addEdge('includedIn', artOfFrenchCookingVolOne)
    meal8.addEdge('includedIn', artOfSimpleFood)
    meal5.addEdge('includedIn', frenchChefCookbook)
    In Studio:
    Figure 2. Data for the Recipe Toy Graph
    The g.V() command at the end of the script displays all the vertices created.
    In Gremlin console:
    // A series of returns  for vertices and edges will mark the successful completion of the script
    // Sample vertex
    ==>v[{~label=meal, type="dinner", mealId=4008}]
    // Sample edge
    ==>e[{~label=includedIn, ~out_vertex={~label=meal, type="dinner", mealId=4008}, 
       ~in_vertex={~label=book, bookId=1004}, 
       [{~label=meal, type="dinner", mealId=4008}-includedIn->{~label=book, bookId=1004}]
  2. If a vertex count is run as either a transactional query or analytical query, there is now a higher count of 61 vertices. Run the vertex count again:

    The DSE Graph Loader is the recommended method for scripting data loading. Using graph.addVertex or g.addV() are only practical for small toy graphs like the recipe example.

  3. Similarly, the edge count can be run, to discover the higher edge count of 67:

QuickStart Exploring traversals

Explore graph data with query traversals.

Exploring the graph with graph traversals can lead to interesting conclusions. Here we'll explore a number of traversals, to show off the power of Gremlin in creating simple queries.


  1. All queries can be profiled to see what the query path is and how the query performs.
    g.V().has('person', 'name', 'Julia Child').profile()
    In Studio:
    Clicking on the bars in the graph in Studio will show more detail about underlying processes in the database.
    In Gremlin console:
    ==>Traversal Metrics
    Step                                                               Count  Traversers       Time (ms)    % Dur
    DsegGraphStep(vertex,[],(label = person & name ...                     1           1          10.097    65.69
      query-optimizer                                                                              1.848
        \_condition=((label = person & name = Julia Child) & (true))
      query-setup                                                                                  0.065
      index-query                                                                                  1.645
        \_statement=SELECT "personId" FROM "DSE_GRAPH_QUICKSTART"."person_p_byName" WHERE "name" = ? LIMIT ?; wit
                    h params (java.lang.String) Julia Child, (java.lang.Integer) 50000
        \_options=Options{consistency=Optional[ONE], serialConsistency=Optional.empty, fallbackConsistency=Option
                  al.empty, pagingState=null, pageSize=-1, user=Optional.empty, waitForSchemaAgreement=true, asyn
    DsegPropertyLoadStep                                                   1           1           5.274    34.31
                                                >TOTAL                     -           -          15.372        -

    In all the following queries, to investigate what happens, and why some queries are more efficient than others, try adding .profile() to any query will show you information similar to the information above.

  2. With several person vertices in the graph, a specific name must be given to find a particular vertex. This traversal gets the stored vertex information for the vertex that has the name of Julia Child. Note that the constraint that the vertex is an author is also included in the has() clause. Graph queries will have lower latency if the query is more specific, and the has() step is an more tool for narrowing the search.
    g.V().has('person', 'name', 'Julia Child')

    Running the query in Studio will display the vertex id, label and all property values. In Gremlin console, this query will only display the vertex id, and the valueMap() step must be appended to get the property values.

  3. In this next traversal, has() filters vertex properties by name = Julia Child as seen above. The traversal step outE() discovers the outgoing edges from that vertex with the authored label.
    g.V().has('name','Julia Child').outE('authored')
    In Studio, either the listing of the Raw JSON view edge information:
    or the Graph view graph visualization where scrolling over a vertex provides additional information.
    In Gremlin console:
    ==>e[{~label=authored, ~out_vertex={~label=person, personId=1}, 
       ~in_vertex={~label=book, bookId=1001}, 
       ~local_id=5deac140-0562-11e8-a4a1-4b3271ac7767}][{~label=person, personId=1}-authored->{~label=book, bookId=1001}]
    ==>e[{~label=authored, ~out_vertex={~label=person, personId=1}, 
       ~in_vertex={~label=book, bookId=1003}, 
       ~local_id=5deac145-0562-11e8-a4a1-4b3271ac7767}][{~label=person, personId=1}-authored->{~label=book, bookId=1003}]
  4. Spark SQL can also be used to discover information for a set of vertices or edges that match particular conditions. Here, all the edges with a createdate greater than May 1, 1975 are returned. Note the lack of camel case column names in Spark SQL.
    SELECT * FROM DSE_GRAPH_QUICKSTART_edges WHERE createdate > '1975-05-01';
    In Studio:
    The data presented in Spark SQL is different than the data stored in the database tables for graph. In Spark SQL tables, the source and destination vertices are listed for an edge, along with the edge label and properties.
  5. If instead, you want to query for the books that all people have written, the query must be modified. The previous example retrieved edges, but not the adjacent book vertices. Add a traversal step inV() to find all the vertices that connect to the outgoing edges, then print the book titles of those vertices. Notice how the chained traversal steps go from the vertices along outgoing edges to the adjacent vertices with V().outE().inV(). The outgoing edges are given a particular filter value, authored.

    In Studio: and a similar listing in Gremlin console.

  6. Notice that the book titles are duplicated in the resulting list, because a listing is returned for each author. If a book has three authors, three listings are returned. The traversal step dedup() can eliminate the duplication.

    In Studio: and a similar listing in Gremlin console.

  7. Refine the traversal by reinserting the has() step for a particular author. Find all the books authored by Julia Child.
    g.V().has('name','Julia Child').outE('authored').inV().values('name')

    In Studio: and a similar listing in Gremlin console.

  8. The previous example and this example accomplish the same result. However, the number of traversal steps and the type of traversal steps can affect performance. The traversal step outE() should be only used if the edges are explicitly required. In this example, the edges are traversed to get information about connected vertices, but the edge information is not important to the query.
    g.V().has('name','Julia Child').out('authored').values('name')

    In Studio: and a similar listing in Gremlin console.

    The traversal step out() retrieves the connected book vertices based on the edge label authored without retrieving the edge information. In a larger graph traversal, this subtle difference in the traversal can become a latency issue.

  9. Additional traversal steps continue to fine-tune the results. Adding another chained has traversal step finds only books authored by Julia Child published after 1967. This example also displays the use of the gt, or greater than function.
    g.V().has('name','Julia Child').out('authored').has('publishYear', gt(1967)).values('name')
    In Studio:
    and a similar listing in Gremlin console.
  10. When developing or testing, oftentimes checking the number of vertices with each vertex label can confirm that data was read. To find the number of vertices by vertex label, use the traversal step label() followed by the traversal step groupCount(). The step groupCount() is useful for aggregating results from a previous step. Although this query can be run in real-time, it is an excellent example of a query that should be run in analytic (OLAP) mode. In Studio, under the run arrow, select Execute using analytic engine (Spark) before running.
  11. An alternative method for getting the group count with Spark SQL uses:
    SELECT `~label` AS label, COUNT(*) AS label_count FROM DSE_GRAPH_QUICKSTART_vertices GROUP BY label;

QuickStart Writing and reading data

Writing and reading graph data.

Writing data from DSE Graph to a file is most easily accomplished with the graph.io() command. The DSE Graph Loader is the most appropriate tool for reading in data from files or other sources.


  1. Write your data to an output file to save or exchange information. A Gryo file is a binary format file that can reload data to DSE Graph. In this next command, graph I/O writes the entire graph to a file. Other file formats can be written by substituting gryo() with graphml() or graphson().
    Note: graph.io() is disabled in sandbox mode.
    In Studio:
    In Gremlin console:
  2. To load a Gryo file, use the graphloader, after creating a mapping script:
    graphloader mappingGRYO.groovy -graph recipe -address localhost
    Details about loading Gryo data are found in Loading Gryo Data, in Using DSE Graph Loader.

QuickStart Listing graphs

How to list graphs.


  1. To discover all graphs that exist, use a system command:
  2. To display all the tables within Spark SQL:

Increase your knowledge

Further increase your knowledge of DSE Graph.

Further adventures in traversing can be found in Creating queries using traversals. If you want to explore various loading options, check out DSE Graph Loader.

DataStax also hosts a DSE Graph self-paced course on DataStax Academy; register for a free account to access the course.