Creating graph schema using Studio

Creating a graph database schema using DataStax Studio and Groovy.

Creating a data model for a graph database is the critical first step towards creating a schema. Once the data model is designed and a graph is created, defining the schema for the vertices and edges and their properties is the next step in creating a graph database. Use Gremlin-Groovy to enter scripts into the cells of DataStax Studio.

Procedure

  1. Optional: If you are reusing a graph that you previously created, drop the graph schema and data.
  2. Optional: If running large scripts, set the timeout value to max to prevent client-side timeouts. Use this setting to ensure that script processing will complete. This step cannot be completed in Studio.
    gremlin> :remote config timeout max
  3. Optional: If running large scripts, set the evaluation_timeout value to max to prevent server-side timeouts. Use this setting to ensure that script processing will complete.
    graph.schema().config().option("graph.traversal_sources.g.evaluation_timeout").set("PT10M")
    Important: Setting a timeout value of greater than 1095 days (maximum integer) can exceed the limit of a graph session. Starting a new session and setting the timeout to a lower value can recover access to a hung session. This caution is applicable for all timeouts: evaluation_timeout, system_evaluation_timeout, analytic_evaluation_timeout, and realtime_evaluation_timeout
  4. Copy and paste the Recipe Schema listed in the Example below in a single cell in DataStax Studio. Once the entire script is entered, run the cell. Studio submits the commands to the Gremlin server.
    NOTE: Each command submitted is within a single session, so from cell to cell, the Gremlin server is not aware of any variables set on the previous line. If any of the lines in the Recipe Schema are entered separately in cells, an error will occur on the edge creation commands.
  5. The following steps show the details of the full script broken down into sections.
  6. Define the properties for the vertices and the edges. The data type of the property is specified in addition to a key name. All properties created in this example are Text, Integers, or Timestamps. Other data types are available. Properties will be used to retrieve selective subsets of the graph and to retrieve stored values. Properties are global in nature, and the pairing of a vertex label and a property will uniquely identify a property for use in traversals. Edge properties are expensive to update, as because the whole edge with all its properties are deleted and recreated to update edge properties. Use edge properties only in situations that warrant their use.
    // Property Keys 
    // Check for previous creation of property key with ifNotExists() 
    schema.propertyKey('name').Text().ifNotExists().create() 
    schema.propertyKey('gender').Text().create()
    schema.propertyKey('instructions').Text().create()
    schema.propertyKey('category').Text().create()
    schema.propertyKey('year').Int().create()
    schema.propertyKey('timestamp').Timestamp().create()
    schema.propertyKey('ISBN').Text().create()
    schema.propertyKey('calories').Int().create()
    schema.propertyKey('amount').Text().create()
    schema.propertyKey('stars').Int().create()
    schema.propertyKey('comment').Text().single().create() // single() is optional - default
    // Example of multiple property
    // schema.propertyKey('nickname').Text().multiple().create();
    // Example meta-property added to property: 
    // schema.propertyKey('livedIn').Text().create()
    // schema.propertyKey('country').Text().multiple().properties('livedIn').create()

    Property keys can be checked for prior existence with ifNotExists(). Property keys can be created with either single or multiple cardinality with single() or multiple(). The default is single cardinality which does not have to be specified, but it can be explicitly stated as in the example.

    Meta-properties, or properties of properties, can be created using propertyKey() followed by properties(). The property key must exist prior to the creation of a meta-property. Meta-properties cannot be nested, i.e., a meta-property cannot have a meta-property. In this example, country is the property that has a meta-property livedIn. This property and meta-property are used to represent the countries that an author has lived in at various times in their life.
    {
      "name":"Julia Child",
      "gender":"F",
      [ {"country": "United States", "livedIn": "1929-1949" }, {"country": "France", "livedIn": "1949-1952" } ], 
      "authored":[{
        "book":{
          "label":"book",
          "bookTitle":"Art of French Cooking Volume One",
          "publishDate":1968
        },
        "book":{
          "label":"book",
          "bookTitle":"The French Chef Cookbook",
          "publishDate":1968,
          "ISBN": "0-394-40135-2"
        }     
      }],
      "created": [{
        
          "type" : "recipe",
          "recipeTitle" : "BeefBourguignon",
          "instructions" : "Braise the beef.",
          "createDate":1967
        },
        { 
          "type" : "recipe",
          "recipeTitle" : "Salade Nicoise",
          "instructions" : "Break the lettuce into pieces.",
          "createDate": 1970
        }
      ]
    }
  7. Define the vertex labels. The vertex labels identify the type of vertices that can be created.
    // Vertex Labels
    schema.vertexLabel('author').ifNotExists().create()
    schema.vertexLabel('recipe').create()
    // Example of creating vertex label with properties
    // schema.vertexLabel('recipe').properties('name','instructions').create()
    schema.vertexLabel('ingredient').create()
    schema.vertexLabel('book').create()
    schema.vertexLabel('meal').create()
    schema.vertexLabel('reviewer').create()
    // Example of custom vertex id:
    // schema.propertyKey('city_id').Int().create()
    // schema.propertyKey('sensor_id').Uuid().create()
    // schema().vertexLabel('FridgeSensor').partitionKey('city_id').clusteringKey('sensor_id').create()
    Vertex labels can be checked for prior existence using ifNotExists(). Vertex labels can be created along with properties. Vertex labels can be created with custom vertex ids, rather than the standard vertex ids.
    Note: Standard auto-generated ids are deprecated with DSE 6.0. Custom ids will undergo changes, and specifying vertex ids with partitionKey and clusteringKey will likely become the normal method.

    DSE Graph limits the number of vertex labels to 200 per graph.

  8. Define the edge labels. The edge labels identify the type of edges that can be created.
    // Edge Labels
    schema.edgeLabel('authored').ifNotExists().create()
    schema.edgeLabel('created').create()
    schema.edgeLabel('includes').create()
    schema.edgeLabel('includedIn').create()
    schema.edgeLabel('rated').properties('stars').connection('reviewer','recipe').create()

    Edge labels can be checked for prior existence using ifNotExists(). Edge labels can be created with adjacent vertex labels identified using connection(). Edge labels can identify properties that an edge has using properties().

  9. Define indexes that can speed up the query processing. All types of indexes are presented here. Indexing graph data has more information.
    // Vertex Indexes
    // Secondary
    schema.vertexLabel('author').index('byName').secondary().by('name').add()
    // Materialized	  		
    schema.vertexLabel('recipe').index('byRecipe').materialized().by('name').add()
    schema.vertexLabel('meal').index('byMeal').materialized().by('name').add()
    schema.vertexLabel('ingredient').index('byIngredient').materialized().by('name').add()
    schema.vertexLabel('reviewer').index('byReviewer').materialized().by('name').add()
    // Search
    // schema.vertexLabel('recipe').index('search').search().by('instructions').asText().add()
    // schema.vertexLabel('recipe').index('search').search().by('instructions').asString().add()
    // If more than one property key is search indexed
    // schema.vertexLabel('recipe').index('search').search().by('instructions').asText().by('category').asString().add()
    
    // Edge Index
    schema.vertexLabel('reviewer').index('ratedByStars').outE('rated').by('stars').add()
    
    // Example of property index using meta-property 'livedIn': 
    // schema.vertexLabel('author').index('byLocation').property('country').by('livedIn').add()

    These indexes are included to make the schema for the food example more efficient for data loading.

    Note: The difference between create() and add() is subtle but important. If an entity (vertex label or edge label) has been created and already exists, if an index or property keys are associated with the entity, then an add() command is used. For example, a vertex label and property keys can be created, and then the property keys can be added to the vertex label.
  10. After creating the graph schema, examine the schema to verify. A portion of the output is shown.
     schema.describe()

Example

// RECIPE SCHEMA

// To run in Studio, copy and paste all lines to a cell and run.

// To run in Gremlin console, use the next two lines:
// script = new File('/tmp/RecipeSchema.groovy').text; []
// :> @script
    	
// Property Keys 
// Check for previous creation of property key with ifNotExists() 
schema.propertyKey('name').Text().ifNotExists().create() 
schema.propertyKey('gender').Text().create()
schema.propertyKey('instructions').Text().create()
schema.propertyKey('category').Text().create()
schema.propertyKey('year').Int().create()
schema.propertyKey('timestamp').Timestamp().create()
schema.propertyKey('ISBN').Text().create()
schema.propertyKey('calories').Int().create()
schema.propertyKey('amount').Text().create()
schema.propertyKey('stars').Int().create()
schema.propertyKey('comment').Text().single().create() // single() is optional - default
// Example of multiple property
// schema.propertyKey('nickname').Text().multiple().create();
// Example meta-property added to property: 
// schema.propertyKey('livedIn').Text().create()
// schema.propertyKey('country').Text().properties('livedIn').create()
    		
// Vertex Labels
schema.vertexLabel('author').ifNotExists().create()
schema.vertexLabel('recipe').create()
// Example of creating vertex label with properties
// schema.vertexLabel('recipe').properties('name','instructions').create()
schema.vertexLabel('ingredient').create()
schema.vertexLabel('book').create()
schema.vertexLabel('meal').create()
schema.vertexLabel('reviewer').create()
// Example of custom vertex id:
// schema.propertyKey('city_id').Int().create()
// schema.propertyKey('sensor_id').Uuid().create()
// schema().vertexLabel('FridgeSensor').partitionKey('city_id').clusteringKey('sensor_id').create()
                
// Edge Labels
schema.edgeLabel('authored').ifNotExists().create()
schema.edgeLabel('created').create()
schema.edgeLabel('includes').create()
schema.edgeLabel('includedIn').create()
schema.edgeLabel('rated').properties('stars').connection('reviewer','recipe').create()
                
// Vertex Indexes
// Secondary
schema.vertexLabel('author').index('byName').secondary().by('name').add()
// Materialized	  		
schema.vertexLabel('recipe').index('byRecipe').materialized().by('name').add()
schema.vertexLabel('meal').index('byMeal').materialized().by('name').add()
schema.vertexLabel('ingredient').index('byIngredient').materialized().by('name').add()
schema.vertexLabel('reviewer').index('byReviewer').materialized().by('name').add()
// Search
// schema.vertexLabel('recipe').index('search').search().by('instructions').asText().add()
// schema.vertexLabel('recipe').index('search').search().by('instructions').asString().add()
// If more than one property key is search indexed
// schema.vertexLabel('recipe').index('search').search().by('instructions').asText().by('category').asString().add()

// Edge Index
schema.vertexLabel('reviewer').index('ratedByStars').outE('rated').by('stars').add()

// Example of property index using meta-property 'livedIn': 
// schema().vertexLabel('author').index('byLocation').property('country').by('livedIn').add()

// Schema description
// Use to check that the schema is built as desired
schema.describe()