Creating vertex label schema
Creating vertex label database schema.
Vertex labels, as discussed in the data model, define the vertex id and associated
property keys for each type of vertex created. Property keys must be created prior to using them in vertex label
creation. Vertex label schema can be created with create()
or
property keys can be added to existing schema with add()
. Vertex
labels can be created with a specific Time-To-Live (TTL) value, or prior existence
of a vertex label can be checked using ifNotExists()
.
A key component of a vertex label is the vertex id which identifies the data
locality with which vertices with a particular vertex label will be stored.
User-defined vertex ids (UDV ids) are analogous to a primary key in RDBMS, and map
directly to the underlying data representation of the graph in Apache Cassandra®.
UDV ids identify the unique property
keys that define the partitionKey
and
clusteringKey
of a vertex label. The values associated with the
UDV ids define the node in a DSE cluster that vertices will be partitioned
(partitionKey
), and the order in which the data is stored in
the associated tables (clusteringKey
).
- Single-key: composed of a single property in a
partitionKey
- Maps every instance of a vertex label to a distinct DSE partition and distributes the data around the DSE cluster based on DSE distribution methodologies.
- Multiple-key: composed of more than one property in a
partitionKey
- Maps a particular vertex label and associated properties to a distinct DSE partition, but contains more than one property key to identify the uniqueness.
- Composite key: composed of a
partitionKey
and aclusteringKey
- Includes both a
partitionKey
which maps a particular vertex label to a distinct DSE partition and one or moreclusteringKeys
to group data within a partition.
community_id
as a partitionKey and member_id
as a clusteringKey. Because a unique id is created for every vertex, duplicate
elements can be created with the same property values accidentally, leading to
confusion. Caching can improve query performance and is configurable. DSE Graph has
two types of cache: adjacency list cache and index/property cache. Either edges or
properties can be cached using the cache()
option with vertexLabel(). Caching can be configured for all
edges, all properties, or a filtered set of edges. Vertices are not cached directly,
but caching properties and edges that define the relationship between vertices
essentially accomplishes the same operation. The best use of caching is for static
values.
Property caching is enabled if indexes exist and are used in the course of queries. Full graph scan queries will not be cached. If an index does not exist, then caching does not occur. Adjacency list caching is enabled if caching is configured for edges.
The caches are local to a node and data is loaded into cache when it is read with a
query. Both caches are set to a default size of 128 MB in the
dse.yaml file. The settings are
adjacency_cache_size_in_mb
and
index_cache_size_in_mb
. Both caches utilize off-heap memory
implemented as Least Recently Used (LRU) cache.
Graph cache is local to each node in the cluster, so the cached data can be different between nodes. Thus, a query can use cache on one node, but not on another. The caches are updated only when the data is not found. Graph caching does not have any means of eviction. No flushing occurs, and the cache is not updated if an element is deleted or modified. The cache will only evict data based on the time-to-live (TTL) value set when the cache is configured for an element. Set a low TTL value for elements (property keys, vertex labels, edge labels) that change often to avoid stale data.
Caching is intended to help make queries more efficient if the same information is
required in a later query. For instance, caching the calories
property for meal_item
vertices will improve the retrieval of a
query asking for all meal items with a calorie count less than 850 calories.
However, it is useful only for rarely changed graph data, and queries run in the
same sort order. Caching calories
for the query above will not
reduced query latency if the query asks for all recipes with a calorie count
greater than 850 calories
dse.yaml
The location of the dse.yaml file depends on the type of installation:Package installations | /etc/dse/dse.yaml |
Tarball installations | installation_location/resources/dse/conf/dse.yaml |
Prerequisites
Procedure
-
Create a vertex label with a single-key vertex id of
sensorId
. The property keysensorId
must exist prior to use in creating the vertex label and cannot be a multiple cardinality property.
This vertex id will store data based on the uniqueschema.vertexLabel('FridgeSensor').partitionKey('sensorId').create()
sensorId
value for eachFridgeSensor
, distributing the data throughout the entire DSE cluster. -
Create a vertex label with a composite key vertex id of
cityId
and clustering keysensorId
.
The vertex id in this example will store all data forschema.vertexLabel('FridgeSensor').partitionKey('cityId').clusteringKey('sensorId').create()
FridgeSensors
with a particularcityId
on the same partition, but order the data based on thesensorId
. If the city has a large number of sensors, the table storing these vertices could grow quite big. -
Create a vertex label with a multiple-key vertex id using both
cityId
andsensorId
as part of the partitioning key.
This vertex id will hash both property keys before distributing the data in the cluster, so that each is uniquely stored based on more information.schema.vertexLabel('FridgeSensor').partitionKey('cityId', 'sensorId').create()
-
If no partitionKey or clusteringKey are specified, an auto-generated vertex id
will be generated when data is created:
schema.vertexLabel('anAutoId').create()
The vertex id consists of the label plus the two attributescommunity_id
and amember_id
:{~label=anAutoId, community_id=1270013568, member_id=0}
-
Properties can be defined in either the
create()
oradd()
statement:
orschema.vertexLabel('book').partitionKey('bookId').create() schema.vertexLabel('book').properties('name','publishYear','ISBN','bookDiscount').add()
schema.vertexLabel('book').partitionKey('bookId').properties('publishYear', 'ISBN', 'name', 'bookDiscount').create()
-
Cache all properties for
person
vertices up to an hour (3600 seconds):schema.vertexLabel('person').cache().properties().ttl(3600).add()
Enabling property cache causes index queries to use an index cache for the specified vertex label.
-
Cache both incoming and outgoing
created
edges forperson
vertices up to a minute (60 seconds):schema.vertexLabel('person').cache().bothE('created').ttl(60).add()
Example
// ********
// VERTEX LABELS
// ********
// SYNTAX:
// schema.vertexLabel('vertexLabel').
// [ partitionKey(propertyKey, [ partitionKey(propertyKey) ]) ].
// [ clusteringKey(propertyKey) ].
// [ ttl ].
// [ properties(property, property) ].
// [ index ].
// [ cache() ].
// [ ifNotExists() ].
// [ create() | add() | describe() | exists() ]
// ********
// SINGLE-KEY VERTEX ID
schema.vertexLabel('person').partitionKey('personId').create()
schema.vertexLabel('person').properties('name','nickname','gender','calGoal','macroGoal','country').add()
schema.vertexLabel('book').partitionKey('bookId').create()
schema.vertexLabel('book').properties('name','publishYear','ISBN','bookDiscount').add()
schema.vertexLabel('meal_item').partitionKey('itemId').create()
schema.vertexLabel('meal_item').properties('name','servAmt', 'macro', 'calories').add()
schema.vertexLabel('ingredient').partitionKey('ingredId').create()
schema.vertexLabel('ingredient').properties('name').add()
schema.vertexLabel('home').partitionKey('homeId').create()
schema.vertexLabel('home').properties('name','address').add()
schema.vertexLabel('store').partitionKey('storeId').create()
schema.vertexLabel('store').properties('name','address').add()
schema.vertexLabel('location').partitionKey('locId').create()
schema.vertexLabel('location').properties('name', 'geoPoint').add()
schema.vertexLabel('recipe').partitionKey('recipeId').create()
schema.vertexLabel('recipe').properties('name','cuisine', 'instructions','notes').add()
// MULTIPLE-KEY VERTEX ID
schema.vertexLabel('meal').partitionKey('type', 'mealId').create()
// COMPOSITE KEY VERTEX ID
schema.vertexLabel('fridgeSensor').partitionKey('stateId', 'cityId').clusteringKey('sensorId').create()
schema.vertexLabel('fridgeSensor').properties('name').add()