Creating vertex label schema

Creating vertex label database schema.

Vertex labels, as discussed in the data model, define the vertex id and associated property keys for each type of vertex created. Property keys must be created prior to using them in vertex label creation. Vertex label schema can be created with create() or property keys can be added to existing schema with add(). Vertex labels can be created with a specific Time-To-Live (TTL) value, or prior existence of a vertex label can be checked using ifNotExists().

A key component of a vertex label is the vertex id which identifies the data locality with which vertices with a particular vertex label will be stored. User-defined vertex ids (UDV ids) are analogous to a primary key in RDBMS, and map directly to the underlying data representation of the graph in DataStax’s distribution of Apache CassandraTM. UDV ids identify the unique property keys that define the partitionKey and clusteringKey of a vertex label. The values associated with the UDV ids define the node in a DSE cluster that vertices will be partitioned (partitionKey), and the order in which the data is stored in the associated tables (clusteringKey).

A UDV id can be defined using three different arrangements for the vertex id:
Single-key: composed of a single property in a partitionKey
Maps every instance of a vertex label to a distinct DSE partition and distributes the data around the DSE cluster based on DSE distribution methodologies.
Multiple-key: composed of more than one property in a partitionKey
Maps a particular vertex label and associated properties to a distinct DSE partition, but contains more than one property key to identify the uniqueness.
Composite key: composed of a partitionKey and a clusteringKey
Includes both a partitionKey which maps a particular vertex label to a distinct DSE partition and one or more clusteringKeys to group data within a partition.
CAUTION: Keep in mind that UDV ids must be globally unique within the graph.
Auto-generated vertex ids also exist, but are discouraged. If a partitionKey or clusteringKey are not specified, an auto-generated vertex id will be created that assigns values to two internal properties, community_id as a partitionKey and member_id as a clusteringKey. Because a unique id is created for every vertex, duplicate elements can be created with the same property values accidentally, leading to confusion.
Note: Auto-generated vertex ids are deprecated with DSE 6.0.

Caching can improve query performance and is configurable. DSE Graph has two types of cache: adjacency list cache and index/property cache. Either edges or properties can be cached using the cache() option with vertexLabel(). Caching can be configured for all edges, all properties, or a filtered set of edges. Vertices are not cached directly, but caching properties and edges that define the relationship between vertices essentially accomplishes the same operation. The best use of caching is for static values.

Property caching is enabled if indexes exist and are used in the course of queries. Full graph scan queries will not be cached. If an index does not exist, then caching does not occur. Adjacency list caching is enabled if caching is configured for edges.

The caches are local to a node and data is loaded into cache when it is read with a query. Both caches are set to a default size of 128 MB in the dse.yaml file. The settings are adjacency_cache_size_in_mb and index_cache_size_in_mb. Both caches utilize off-heap memory implemented as Least Recently Used (LRU) cache.

Graph cache is local to each node in the cluster, so the cached data can be different between nodes. Thus, a query can use cache on one node, but not on another. The caches are updated only when the data is not found. Graph caching does not have any means of eviction. No flushing occurs, and the cache is not updated if an element is deleted or modified. The cache will only evict data based on the time-to-live (TTL) value set when the cache is configured for an element. Set a low TTL value for elements (property keys, vertex labels, edge labels) that change often to avoid stale data.

Caching is intended to help make queries more efficient if the same information is required in a later query. For instance, caching the calories property for meal_item vertices will improve the retrieval of a query asking for all meal items with a calorie count less than 850 calories. However, it is useful only for rarely changed graph data, and queries run in the same sort order. Caching calories for the query above will not reduced query latency if the query asks for all recipes with a calorie count greater than 850 calories

DSE Graph limits the number of vertex labels to 200 per graph.


The location of the dse.yaml file depends on the type of installation:
Package installations /etc/dse/dse.yaml
Tarball installations installation_location/resources/dse/conf/dse.yaml


User-defined vertex ids

  1. Create a vertex label with a single-key vertex id of sensorId. The property key sensorId must exist prior to use in creating the vertex label and cannot be a multiple cardinality property.
    This vertex id will store data based on the unique sensorId value for each FridgeSensor, distributing the data throughout the entire DSE cluster.
  2. Create a vertex label with a composite key vertex id of cityId and clustering key sensorId.
    The vertex id in this example will store all data for FridgeSensors with a particular cityId on the same partition, but order the data based on the sensorId. If the city has a large number of sensors, the table storing these vertices could grow quite big.
  3. Create a vertex label with a multiple-key vertex id using both cityId and sensorId as part of the partitioning key.
    schema.vertexLabel('FridgeSensor').partitionKey('cityId', 'sensorId').create()
    This vertex id will hash both property keys before distributing the data in the cluster, so that each is uniquely stored based on more information.

Auto-generated vertex ids

  1. If no partitionKey or clusteringKey are specified, an auto-generated vertex id will be generated when data is created:
    The vertex id consists of the label plus the two attributes community_id and a member_id:
    {~label=anAutoId, community_id=1270013568, member_id=0}

Associating property keys with vertex labels

  1. Properties can be defined in either the create() or add()statement:
    schema.vertexLabel('book').partitionKey('bookId').properties('publishYear', 'ISBN', 'name', 'bookDiscount').create()


  1. Cache all properties for person vertices up to an hour (3600 seconds):

    Enabling property cache causes index queries to use an index cache for the specified vertex label.

  2. Cache both incoming and outgoing created edges for person vertices up to a minute (60 seconds):


The vertex labels used for the DSE QuickStart example used throughout the documentation:
// ********
// ********
// schema.vertexLabel('vertexLabel').
//    [ partitionKey(propertyKey, [ partitionKey(propertyKey) ]) ].
//    [ clusteringKey(propertyKey) ].
//    [ ttl ].
//    [ properties(property, property) ].
//    [ index ].
//    [ cache() ].
//    [ ifNotExists() ].
//    [ create() | add() | describe() | exists() ]
// ********

schema.vertexLabel('meal_item').properties('name','servAmt', 'macro', 'calories').add()
schema.vertexLabel('location').properties('name', 'geoPoint').add()
schema.vertexLabel('recipe').properties('name','cuisine', 'instructions','notes').add()
schema.vertexLabel('meal').partitionKey('type', 'mealId').create()
schema.vertexLabel('fridgeSensor').partitionKey('stateId', 'cityId').clusteringKey('sensorId').create()