Creating graph schema

Creating graph database schema.

Creating a data model for a graph database is the critical first step towards creating a schema. Once the data model is designed and a graph is created, defining the schema for the vertices and edges and their properties is the next step in creating a graph database. Gremlin-Groovy is the language used to create scripts; Gremlin-Groovy is packaged with the Apache TinkerPop engine, and can be used with either DataStax Studio or the Gremlin console (dse gremlin-console) installed with DataStax Enterprise.

Graph schema can be created with create() or added to existing schema with add().

Prerequisites

Create a graph.

Procedure

  1. Optional. If you are reusing a graph that you previously created, drop the graph schema and data.
  2. Optional. If running large scripts in Gremlin console, set the timeout value to max to prevent client-side time outs. Use this setting to ensure that script processing will complete. This step cannot be completed in Studio.
    gremlin> :remote config timeout max
  3. Optional. If running large scripts, set the evaluation_timeout value to max to prevent server-side timeouts. Use this setting to ensure that script processing will complete.
    graph.schema().config().option("graph.traversal_sources.g.evaluation_timeout").set("PT10M")
    Important: Setting a timeout value greater than 1095 days (maximum integer) can exceed the limit of a graph session. Starting a new session and setting the timeout to a lower value can recover access to a hung session. This caution is applicable for all timeouts: evaluation_timeout, system_evaluation_timeout, analytic_evaluation_timeout, and realtime_evaluation_timeout
  4. Load the example schema listed in the Example below:
    1. In Studio, copy and paste the entire code block into a single cell and execute the cell.
    2. In Gremlin console, copy and paste the example schema to a schema file. Two choices for loading are shown:
      Use the :load command by specifying the location of the schema file:
      gremlin> :load /tmp/RecipeSchema.groovy
      Use a cat command in the machine shell to load the file, specifying the location of the schema file, and pass to the Gremlin console:
      $ cat /tmp/RecipeSchema.groovy | dse gremlin-console
    NOTE: Each command submitted is within a single session, so from cell to cell (Studio) or line to line (Gremlin console), the Gremlin server is not aware of any variables set on the previous line. If any of the lines in the Recipe Schema are entered separately, an error will occur on the edge creation commands.
  5. The following steps show the details of the full script broken down into sections.
  6. Define the properties for the vertices and the edges. The data type of the property is specified in addition to a key name. All properties created in this example are Text, Integers, or Timestamps. Other data types are available. Properties will be used to retrieve selective subsets of the graph and to retrieve stored values. Properties are global in nature, and the pairing of a vertex label and a property will uniquely identify a property for use in traversals. Edge properties are expensive to update, as because the whole edge with all its properties are deleted and recreated to update edge properties. Use edge properties only in situations that warrant their use.
    // Property Keys 
    // Check for previous creation of property key with ifNotExists() 
    schema.propertyKey('name').Text().ifNotExists().create() 
    schema.propertyKey('gender').Text().create()
    schema.propertyKey('instructions').Text().create()
    schema.propertyKey('category').Text().create()
    schema.propertyKey('year').Int().create()
    schema.propertyKey('timestamp').Timestamp().create()
    schema.propertyKey('ISBN').Text().create()
    schema.propertyKey('calories').Int().create()
    schema.propertyKey('amount').Text().create()
    schema.propertyKey('stars').Int().create()
    schema.propertyKey('comment').Text().single().create() // single() is optional - default
    // Example of multiple property
    // schema.propertyKey('nickname').Text().multiple().create();
    // Example meta-property added to property: 
    // schema.propertyKey('livedIn').Text().create()
    // schema.propertyKey('country').Text().multiple().properties('livedIn').create()

    Property keys can be checked for prior existence with ifNotExists(). Property keys can be created with either single or multiple cardinality with single() or multiple(). The default is single cardinality which does not have to be specified, but it can be explicitly stated as in the example.

    Meta-properties, or properties of properties, can be created using propertyKey() followed by properties(). The property key must exist prior to the creation of a meta-property. Meta-properties cannot be nested, i.e., a meta-property cannot have a meta-property. In this example, country is the property that has a meta-property livedIn. This property and meta-property are used to represent the countries that an author has lived in at various times in their life.
    {
      "name":"Julia Child",
      "gender":"F",
      [ {"country": "United States", "livedIn": "1929-1949" }, {"country": "France", "livedIn": "1949-1952" } ], 
      "authored":[{
        "book":{
          "label":"book",
          "bookTitle":"Art of French Cooking Volume One",
          "publishDate":1968
        },
        "book":{
          "label":"book",
          "bookTitle":"The French Chef Cookbook",
          "publishDate":1968,
          "ISBN": "0-394-40135-2"
        }     
      }],
      "created": [{
        
          "type" : "recipe",
          "recipeTitle" : "BeefBourguignon",
          "instructions" : "Braise the beef.",
          "createDate":1967
        },
        { 
          "type" : "recipe",
          "recipeTitle" : "Salade Nicoise",
          "instructions" : "Break the lettuce into pieces.",
          "createDate": 1970
        }
      ]
    }
  7. Define the vertex labels. The vertex labels identify the type of vertices that can be created.
    // Vertex Labels
    schema.vertexLabel('author').ifNotExists().create()
    schema.vertexLabel('recipe').create()
    // Example of creating vertex label with properties
    // schema.vertexLabel('recipe').properties('name','instructions').create()
    schema.vertexLabel('ingredient').create()
    schema.vertexLabel('book').create()
    schema.vertexLabel('meal').create()
    schema.vertexLabel('reviewer').create()
    // Example of custom vertex id:
    // schema.propertyKey('city_id').Int().create()
    // schema.propertyKey('sensor_id').Uuid().create()
    // schema().vertexLabel('FridgeSensor').partitionKey('city_id').clusteringKey('sensor_id').create()
    Vertex labels can be checked for prior existence using ifNotExists(). Vertex labels can be created along with properties. Vertex labels can be created with user-defined vertex ids, rather than the autogenerated vertex ids.
    Note: Auto-generated vertex ids are deprecated with DSE 6.0.

    DSE Graph limits the number of vertex labels to 200 per graph.

  8. Define the edge labels. The edge labels identify the type of edges that can be created.
    // Edge Labels
    schema.edgeLabel('authored').ifNotExists().create()
    schema.edgeLabel('created').create()
    schema.edgeLabel('includes').create()
    schema.edgeLabel('includedIn').create()
    schema.edgeLabel('rated').properties('rating').connection('reviewer','recipe').create()

    Edge labels can be checked for prior existence using ifNotExists(). Edge labels can be created with adjacent vertex labels identified using connection(). Edge labels can identify properties that an edge has using properties().

  9. Define indexes that can speed up the query processing. All types of indexes are presented here. Indexing has more information.
    // Vertex Indexes
    // Secondary
    schema.vertexLabel('author').index('byName').secondary().by('name').add()
    // Materialized	  		
    schema.vertexLabel('recipe').index('byRecipe').materialized().by('name').add()
    schema.vertexLabel('meal').index('byMeal').materialized().by('name').add()
    schema.vertexLabel('ingredient').index('byIngredient').materialized().by('name').add()
    schema.vertexLabel('reviewer').index('byReviewer').materialized().by('name').add()
    // Search
    // schema.vertexLabel('recipe').index('search').search().by('instructions').asText().add()
    // schema.vertexLabel('recipe').index('search').search().by('instructions').asString().add()
    // If more than one property key is search indexed
    // schema.vertexLabel('recipe').index('search').search().by('instructions').asText().by('category').asString().add()
    
    // Edge Index
    schema.vertexLabel('reviewer').index('ratedByStars').outE('rated').by('stars').add()
    
    // Example of property index using meta-property 'livedIn': 
    // schema.vertexLabel('author').index('byLocation').property('country').by('livedIn').add()

    These indexes are included to make the schema for the food example more efficient for data loading.

    Note: The difference between create() and add() is subtle but important. If an entity (vertex label or edge label) has been created and already exists, if an index or property keys are associated with the entity, then an add() command is used. For example, a vertex label and property keys can be created, and then the property keys can be added to the vertex label.
  10. After creating the graph schema, examine the schema to verify. A portion of the output is shown.
     schema.describe()

Example

// RECIPE SCHEMA

// To run in Studio, copy and paste all lines to a cell and run.

// To run in Gremlin console, use the next two lines:
// script = new File('/tmp/RecipeSchema.groovy').text; []
// :> @script
    	
// Property Keys 
// Check for previous creation of property key with ifNotExists() 
schema.propertyKey('name').Text().ifNotExists().create() 
schema.propertyKey('gender').Text().create()
schema.propertyKey('instructions').Text().create()
schema.propertyKey('category').Text().create()
schema.propertyKey('year').Int().create()
schema.propertyKey('timestamp').Timestamp().create()
schema.propertyKey('ISBN').Text().create()
schema.propertyKey('calories').Int().create()
schema.propertyKey('amount').Text().create()
schema.propertyKey('stars').Int().create()
schema.propertyKey('comment').Text().single().create() // single() is optional - default
// Example of multiple property
// schema.propertyKey('nickname').Text().multiple().create();
// Example meta-property added to property: 
// schema.propertyKey('livedIn').Text().create()
// schema.propertyKey('country').Text().properties('livedIn').create()
    		
// Vertex Labels
schema.vertexLabel('author').ifNotExists().create()
schema.vertexLabel('recipe').create()
// Example of creating vertex label with properties
// schema.vertexLabel('recipe').properties('name','instructions').create()
schema.vertexLabel('ingredient').create()
schema.vertexLabel('book').create()
schema.vertexLabel('meal').create()
schema.vertexLabel('reviewer').create()
// Example of custom vertex id:
// schema.propertyKey('city_id').Int().create()
// schema.propertyKey('sensor_id').Uuid().create()
// schema().vertexLabel('FridgeSensor').partitionKey('city_id').clusteringKey('sensor_id').create()
                
// Edge Labels
schema.edgeLabel('authored').ifNotExists().create()
schema.edgeLabel('created').create()
schema.edgeLabel('includes').create()
schema.edgeLabel('includedIn').create()
schema.edgeLabel('rated').properties('stars').connection('reviewer','recipe').create()
                
// Vertex Indexes
// Secondary
schema.vertexLabel('author').index('byName').secondary().by('name').add()
// Materialized	  		
schema.vertexLabel('recipe').index('byRecipe').materialized().by('name').add()
schema.vertexLabel('meal').index('byMeal').materialized().by('name').add()
schema.vertexLabel('ingredient').index('byIngredient').materialized().by('name').add()
schema.vertexLabel('reviewer').index('byReviewer').materialized().by('name').add()
// Search
// schema.vertexLabel('recipe').index('search').search().by('instructions').asText().add()
// schema.vertexLabel('recipe').index('search').search().by('instructions').asString().add()
// If more than one property key is search indexed
// schema.vertexLabel('recipe').index('search').search().by('instructions').asText().by('category').asString().add()

// Edge Index
schema.vertexLabel('reviewer').index('ratedByStars').outE('rated').by('stars').add()

// Example of property index using meta-property 'livedIn': 
// schema().vertexLabel('author').index('byLocation').property('country').by('livedIn').add()

// Schema description
// Use to check that the schema is built as desired
schema.describe()