QuickStart Simple example

Simple DataStax Graph example.

About this task

Let’s start with a simple example from the recipe data model. The data is composed of two vertices, one person who is an author (Julia Child) and one book (The Art of French Cooking, Vol. 1) with an edge between them to identify that Julia Child authored that book. We’ll supply schema, insert data using graph traversals with g.addV() and g.addE(), then examine the data and run queries.

Execute all code samples using either Studio or Gremlin console by copy/pasting the codeblocks below.

As with all queries in Graph, if you are using Gremlin console, alias the graph traversal g to a graph with :remote config alias g food_qs.g before running any commands.

Procedure

  1. Create the schema for a vertex label person, along with the vertex properties and their data types:

    schema.vertexLabel('person').
      ifNotExists().
      partitionBy('person_id', Uuid).
      property('name', Text).
      property('gender', Text).
      property('nickname', setOf(Text)).
      property('cal_goal', Int).
      property('macro_goal', listOf(Int)).
      property('country', listOf(tupleOf(Text, Date, Date))).
      property('badge', mapOf(Text, Date)).
      create()

    The vertex label defines the Cassandra Query Language (CQL) table name that will store person vertices. Each property consists of a property key and data type that define CQL columns in the table. Both the vertex label and the property definitions must adhere to CQL naming syntax. Like CQL tables, a single or multiple partition key must be defined with at least one property using partitionBy property\_name. Clustering columns may also be defined using a similar clusteringBy property\_name. The partition key defines where in the cluster the data will reside, while the clustering columns define the sort order of the data within a partition.

    A new feature of DataStax Graph 6.8 is the use of collections as data types for properties. The DSE 6.7 Graph and earlier concepts of meta-properties and multi-properties are replaced with collections or nested collections. For instance, country is now defined as a list of tuples, to store multiple records of a country,start_date and end_date in which a person has lived.

    A successful schema command will return:

    ==> OK
  2. Insert a vertex for Julia Child using a g.addV() command.

    g.addV('person').
       property('person_id', 'e7cd5752-bc0d-4157-a80f-7523add8dbcd' as UUID).
       property('name', 'Julia CHILD').
       property('gender','F').
       property('nickname', ['Jay', 'Julia'] as Set).
       property('country', [['USA', '1912-08-12' as LocalDate, '1944-01-01' as LocalDate] as Tuple, ['Ceylon', '1944-01-01' as LocalDate, '1945-06-01' as LocalDate] as Tuple, ['France', '1948-01-01' as LocalDate, '1960-01-01' as LocalDate] as Tuple, ['USA', '1960-01-01' as LocalDate, '2004-08-13' as LocalDate] as Tuple])

    The vertex label person identifies the type of vertex to add along with the property key-value pairs created. Note that some properties include additional information to define the data type conversion from a string to the required type.

    Using a set, the property nickname is defined with multiple values, a replacement for previously supported multi-properties.

    The Studio result:

    GSStudioAddJulia

    In Studio, the result can be displayed using different views: Raw JSON, Table, or Graph. Explore the options.

    The Gremlin console result:

    ==>v[dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd]
  3. Create the schema for a vertex label book:

    schema.vertexLabel('book').
      ifNotExists().
      partitionBy('book_id', Int).
      property('name', Text).
      property('publish_year', Int).
      property('isbn', Text).
      property('category', setOf(Text)).
      create()

    This command and the next one will return results similar to the actions above for creating a person.

  4. Insert a book into the graph:

    g.addV('book').
      property('book_id',1001).
      property('name',"The Art of French Cooking, Vol. 1").
      property('publish_year', 1961).
      property('category', ['French', 'American'] as Set)

    The Studio result:

    GSStudioAddBook

    As with the person vertex, you can see all the information about the book vertex created. In Graph view, use the Settings button (the gear) to change the display label for person by entering Chef {{name}}. Change the book display label with book:{{{[name]}}}. Change the icon for books to a book icon as shown here with the Style-Vertex Shape menu. To set graph display names more generally, look for “Manage Global Defaults” under the Display Names

    The Gremlin console result:

    ==>v[dseg:/book/1001]
  5. Add schema for the edge between the two vertices for a person authoring a book:

    schema.edgeLabel('authored').
      ifNotExists().
      from('person').to('book').
      create()

    The edge label authored is defined, along with the outgoing vertex label person and the incoming vertex label book.

  6. Insert an edge between Julia Child and one of her cookbooks:

    g.V('dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd').as('a').
      V('dseg:/book/1001').as('b').
      addE('authored').from('a').to('b')

    In this query, each vertex is defined and given a temporary label, person as a and book as b, and the temporary labels used to add the edge with addE().from().to(). The partition key and value for each vertex must be included in the query.

    Use Graph view in Studio to see the relationship. Scroll over elements to display additional information.

    The Studio result:

    GSStudio2V1E

    The Gremlin console result:

    ==>e[dseg:/person-authored-book/e7cd5752-bc0d-4157-a80f-7523add8dbcd/1001][dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd-authored->dseg:/book/1001]
  7. In Studio, the results are easy to visualize and check. In the Gremlin console, ensure that the data inserted for the vertices is correct by checking with a query that gets all vertices. All of the commands can be executed in Studio as well as Gremlin console. In production, DSG prevents expensives queries from processing. In development, include the with("label-warning", false) so that a query can run without specifying vertex labels.

    g.with("label-warning", false).V()

    .

    The Gremlin console result:

    ==>v[dseg:/book/1001]
    ==>v[dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd]
  8. Similarly, the edge data inserted can be checked with a query that gets all edges:

    g.with("label-warning", false).E()

    The Gremlin console result:

    ==>e[dseg:/person-authored-book/e7cd5752-bc0d-4157-a80f-7523add8dbcd/1001][dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd-authored->dseg:/book/1001]
  9. A much more useful query would check the data for a vertex using a simple bit of information, like a person’s name. However, without adding an index for name, this query will fail, because the value for the primary key person_id is not supplied. For example:

    g.V().has('person', 'name', 'Julia CHILD')

    Two alternatives exist, a development mode for running queries and a modifier mode with('allow-filtering'). The dev mode is intended for early exploration, before appropriate indexes have been settled upon:

    dev.V().hasLabel('person').has('name', 'Julia CHILD')

    An alternative in development is to use the with('allow-filtering') step which will do a full scan of all partitions:

    g.with('allow-filtering').V().has('person', 'name', 'Julia CHILD')

    Both commands will return the same information, the vertex id for the vertex found with the query. In Studio:

    GSStudioDevVJulia

    and in Gremlin console:

    ==>v[dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd]
  10. You’ll notice when you tried the command g.V().has('person', 'name', 'Julia CHILD'), you got an error message that provides the index required to do the query in a production-safe manner:

    schema.vertexLabel('person').
       materializedView('person_by_name').
       ifNotExists().
       partitionBy('name').
       clusterBy('person_id', Asc).
       create()
    Note that while the index has been created successfully, it may not yet be finished building.
    Alternatively, use '.waitForIndex(<optionalTimeout>).create()' during index creation to wait for the index to be built.
    
    OK

    The index is created as a materialized view table, with a partition key of the column to index and a clustering column of the original table’s partition key. Once the index exists, the query will run without doing a full scan. Indexing is a large topic that is worth reading about, as efficient queries depend on indexes.

  11. Notice that the original query about Julia Child will now run without warnings, after the index is created, and returns the same information as the development queries:

    g.V().has('person', 'name', 'Julia CHILD')
  12. We now have data! Let’s see what kind of graph queries can be executed. First, check the data using the unique partition key:

    g.V().has('person', 'person_id', UUID.fromString('e7cd5752-bc0d-4157-a80f-7523add8dbcd'))

    In Studio:

    GSStudioPersonByID

    In Gremlin console:

    ==>v[dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd]

    While the partition key will be useful in some queries, generally queries use more user-friendly data, like name or category, to find vertices.

  13. Two other useful traversals are elementMap() and valueMap() which print the key-value listing of each property value for specified vertices.

    g.V().hasLabel('person').elementMap()

    In Studio:

    GSStudioElementMap

    Using elementMap() or valueMap() without specifying properties can result in slow query latencies, if a large number of property keys exist for the queried vertex or edge. Specific properties can be specified, such as elementMap('name') or``valueMap('name').

    In Gremlin console:

    gremlin> g.V().hasLabel('person').elementMap()
    ==>{id=dseg:/person/4ce9caf1-25b8-468e-a983-69bad20c017a, label=person, gender=M, name=James BEARD, nickname=[Jim, Jimmy], person_id=4ce9caf1-25b8-468e-a983-69bad20c017a}
    ==>{id=dseg:/person/888ad970-0efc-4e2c-b234-b6a71c30efb5, label=person, gender=M, name=Fritz STREIFF, person_id=888ad970-0efc-4e2c-b234-b6a71c30efb5}
    ==>{id=dseg:/person/4954d71d-f78c-4a6d-9c4a-f40903edbf3c, label=person, cal_goal=1800, gender=M, macro_goal=[30, 20, 50], name=John Smith, nickname=[Johnie], person_id=4954d71d-f78c-4a6d-9c4a-f40903edbf3c}
    ==>{id=dseg:/person/01e22ca6-da10-4cf7-8903-9b7e30c25805, label=person, gender=F, name=Kelsie KERR, person_id=01e22ca6-da10-4cf7-8903-9b7e30c25805}
    ==>{id=dseg:/person/6c09f656-5aef-46df-97f9-e7f984c9a3d9, label=person, cal_goal=1500, gender=F, macro_goal=[50, 15, 35], name=Jane DOE, nickname=[Janie], person_id=6c09f656-5aef-46df-97f9-e7f984c9a3d9}
    ==>{id=dseg:/person/e7cd5752-bc0d-4157-a80f-7523add8dbcd, label=person, country=[('USA','1912-08-12','1944-01-01'), ('Ceylon','1944-01-01','1945-06-01'), ('France','1948-01-01','1950-01-01'), ('USA','1960-01-01','2004-08-13')], gender=F, name=Julia CHILD, nickname=[Jay, Julia], person_id=e7cd5752-bc0d-4157-a80f-7523add8dbcd}

    Using valueMap() returns similar information in a slightly different container, but doesn’t include the element id and label. Try out the command and compare! In DSG 6.8.0, valueMap() is deprecated, so use elementMap().

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com