Creating vertex label schema

Creating vertex label schema.

Vertex labels, as discussed in the data model, define the vertex id and associated property keys for each type of vertex created. A vertex id uniquely identifies each vertex based on the partition and clustering keys selected for the vertex label. Partition and clustering keys directly correspond to the same concepts in CQL of a primary key. Vertex label schema have four operations:
  • create a vertex label with create()
  • describe a vertex label with describe()
  • drop a vertex label with drop()
  • add or drop properties to existing vertex labels with addProperty() or dropProperty()
Prior existence of a vertex label can be checked using ifNotExists() before creating a vertex label. A vertex label can be created from an existing CQL table using fromExistingTable('tableName'). A tableName that is different from the vertex label can be also defined. DataStax Graph uses a particular format for vertex ids when vertices are inserted.

Properties are the key to querying the graph. Two features that have changed in DSG compared to Classic Graph (DSE Graph 5.1-6.7) is the removal of meta-properties (properties of properties) and multi-properties. However, such constructs are still implemented, but with different methods in DSG. Meta-properties can be defined with collections, UDTs, or tuples, as can multi-properties. For example, a badge can be created to store multiple badge levels, along with the date on which the badge was earned with:property('badge', mapOf(Text, Date)).

DSG can also create a vertex label using the CQL command CREATE TABLE, or convert a CQL table into a vertex label using ALTER TABLE. See the CQL commands to create a table as a graph or convert a table into a graph.

CAUTION: DataStax Graph limits the number of vertex and edge labels to 200 per graph.

Prerequisites

Create a graph and use either Gremlin console or DataStax Studio to access the graph. See the DSG QuickStart configuration if you need refreshing.

Procedure

partitionBy and clusterBy

The partitionBy and clusterBy are important choices that identify where in the cluster the data will be located (the partition) and how the data will be sorted within the cluster (clustering key, or cluster column). Using multiple partition keys can spread the vertex data over smaller, more disperse partitions if the vertex data is large and slow the performance of a query to a large partition. Often, the partition key is simply a unique UUID value and nothing more is needed. The clustering keys are useful for retrieving slices, or ranges of vertex data from a partition, as the data will be clustered sequentially based on the clustering key choices. Below you will find a wide variety of vertex label definitions using either single or multiple partition keys as well as single or multiple clustering keys.

  • Create a vertex label person with a single partitionBy:
    schema.vertexLabel('person').
      ifNotExists().
      partitionBy('person_id', Uuid).
      property('name', Text).
      property('gender', Text).
      property('nickname', setOf(Text)).
      property('cal_goal', Int).
      property('macro_goal', listOf(Int)).
      property('country', listOf(tupleOf(Text, Date, Date))).
      property('badge', mapOf(Text, Date)).
      create()
    
    This vertex label is created if it doesn't already exist, has a single partition key person_id, and several properties of varying data types. Note the use of the data types Uuid, Text, setOf(Text), Int, listOf(tupleOf(Text, Date, Date)), and mapOf(Text, Date). In fact, nested collections, tuples, and user-defined types (UDTs) are all valid.
  • Create a vertex label person with a compound partition key, two partitionBy steps:
    schema.vertexLabel('meal').
      ifNotExists().
      partitionBy('type', Text).
      partitionBy( 'meal_id', Int).
      create()
    It is useful to note that the partition keys can be any data type except counter, non-frozen collection, or static, as is true for a CQL PRIMARY KEY
  • Create a vertex label person with a composite primary key, three partitionBy steps and a clusterBy step:
    schema.vertexLabel('fridge_sensor').
      ifNotExists().
      partitionBy('state_id', Int).
      partitionBy('city_id', Int).
      partitionBy('zipcode_id', Int).
      clusterBy('sensor_id', Int).
      property('name', Text).
      create()
    A combined hash is computed for the three partition keys to define the partition location, and the data within each partition will be sorted by sensor_id. The sort order of a clustering key can be defined as either ascending Asc (default) or descending Desc.
  • Create a vertex label shopping_list with a tableName:
    schema.vertexLabel('shopping_list').
      tableName('my_shopping').
      ifNotExists().
      partitionBy('shoplist_id', Int).
      create()
    The CQL table will be named my_shopping, while the vertex label is shopping_list. This option adds versatility to associating vertex label names with table names without requiring an exact match.
  • Convert a CQL table into a vertex label:
    schema.vertexLabel('recipe').
      fromExistingTable('recipe_table').
      create()
    If a CQL table exists, and you wish to explore the data as a graph, this command allows an existing table to be treated as a graph. The table can then be queried with both CQL and Gremlin commands.
  • Add a property to a vertex label:
    schema.vertexLabel('book').
        addProperty('book_discount', Text).
        alter()
    Once you have created a vertex label, you may need to either add or drop a property. The addProperty shown here is available, as well as dropProperty.

Example

The vertex labels used for the DataStax Graph QuickStart example used throughout the documentation:
// VERTEX LABELS
// ********
// SYNTAX:
// schema.vertexLabel('vertexLabel')
//    [ .ifNotExists() ]
//    .partitionBy('propertyName', propertyType) [ ... ]
//    [ .clusterBy('propertyName', propertyType) ... ]
//    [ .property('propertyName', propertyType) ]
//    [ .create() | .describe() | .addProperty('propertyName', propertyType).alter() ]

 
// SINGLE PARTITION KEY Vertex Labels

// macro_goal is a list of carbohydrate, protein, fat
// country is a list of tuple of country, start date, end date; replacement for a meta-property in classic graph
// Also, country demonstrates multi-property, being a list of countries and dates lived in
//    country, start_date, end_date
// badge is  a replacement for a meta-property in earlier versions
//    level:year, such as gold:2015, expert:2019, or sous-chef:2009 (mainly expect to use for reviewers)

// NEED TO ADD NEW FEATURE DSP_18625
//  .tableName('personTable')

// START-createVL_person
schema.vertexLabel('person').
  ifNotExists().
  partitionBy('person_id', Uuid).
  property('name', Text).
  property('gender', Text).
  property('nickname', setOf(Text)).
  property('cal_goal', Int).
  property('macro_goal', listOf(Int)).
  property('country', listOf(tupleOf(Text, Date, Date))).
  property('badge', mapOf(Text, Date)).
  create()

// END-createVL_person
// book_discount was a property in the old data model that had a ttl; I'm including here to use the same datasets 
  // Add as an added property 
  //property('book_discount', Text).

// START-createVL_book
schema.vertexLabel('book').
  ifNotExists().
  partitionBy('book_id', Int).
  property('name', Text).
  property('publish_year', Int).
  property('isbn', Text).
  property('category', setOf(Text)).
  create()

// END-createVL_book
// Going to create vertexLabel recipe through converting a CQL table to a VL
// Although the notebook shows creating a table for recipe with CQL, then converting,
// this is the Gremlin schema to make the recipe vertex label

// START-createVL_recipe
schema.vertexLabel('recipe').
  ifNotExists().
  partitionBy('recipe_id', Int).
  property('name', Text).
  property('cuisine', setOf(Text)).
  property('instructions', Text).
  property('notes', Text).
  create()
// END-createVL_recipe
 
// START-createVL_item_meal
schema.vertexLabel('meal_item').
  ifNotExists().
  partitionBy('item_id', Int).
  property('name', Text).
  property('serv_amt', Text).
  property('macro', listOf(Int)).
  property('calories', Int).
  create()
// END-createVL_item_meal

// START-createVL_ingredient
schema.vertexLabel('ingredient').
  ifNotExists().
  partitionBy('ingred_id', Int).
  property('name', Text).
  create()
// END-createVL_ingredient

// START-createVL_home
schema.vertexLabel('home').
  ifNotExists().
  partitionBy('home_id', Int).
  property('name', Text).
  create()
// END-createVL_home

// START-createVL_store
schema.vertexLabel('store').
  ifNotExists().
  partitionBy('store_id', Int).
  property('name', Text).
  create()
// END-createVL_store


// MULTIPLE-KEY VERTEX ID

// START-createVL_meal
schema.vertexLabel('meal').
  ifNotExists().
  partitionBy('type', Text).
  partitionBy( 'meal_id', Int).
  create()
// END-createVL_meal

// COMPOSITE KEY VERTEX ID

// START-createVL_fridge_sensor
schema.vertexLabel('fridge_sensor').
  ifNotExists().
  partitionBy('state_id', Int).
  partitionBy('city_id', Int).
  partitionBy('zipcode_id', Int).
  clusterBy('sensor_id', Int).
  property('name', Text).
  create()
// END-createVL_fridge_sensor

// GEOSPATIAL

// START-createVL_location
schema.vertexLabel('location').
  ifNotExists().
  partitionBy('loc_id', Text).
  property('name', Text).
  property('loc_details', frozen(typeOf('location_details'))).
  property('geo_point', Point).
  create()
// END-createVL_location

// STATIC COLUMN

// START-createVL_flag
schema.vertexLabel('flag').
  ifNotExists().
  partitionBy('country_id', Int).
  clusterBy('country', Text).
  property('flag', Text, Static).
  create()
// END-createVL_flag