Creating graph schema
Creating graph database schema.
Creating a data model for a graph database is
the critical first step towards creating a schema. Once the data model is designed
and a graph is created, defining the schema for the vertices and edges and their
properties is the next step in creating a graph database. Gremlin-Groovy is the
language used to create scripts; Gremlin-Groovy is packaged with the Apache
TinkerPop engine, and can be used with either DataStax Studio or the
Gremlin console (dse gremlin-console
) installed with DataStax
Graph.
- create any user-defined types (UDTs) that will be used in vertex or edge labels
- create the vertex labels along with vertex properties
- create the edge labels along with edge properties
- create or analyze and apply indexes for either vertex or edge labels
Meta-properties, or properties of properties, can be stored in collections (set,
list, map), tuples, or UDTs. Indexing these properties can facilitate graph queries
that use the data stored in such data types. Collections, tuples and UDTs can be
nested, and the frozen
keyword can be used.
Vertex and edge labels can be checked for prior existence before creation using
ifNotExists()
. Vertex and edge labels can include a partition
key that identifies on which partition the vertex label table will be located. To
faciliatate ordering within a partition for either element, clustering columns can
also be specified. There are limitations to the number of CQL tables, and thus the
number of vertex and edge labels that can be implemented in a single graph. The
limitations are predicated by the limitations of the number of CQL tables Cassandra
can practically handle.
Indexing plays a key role in graph traversal processing. Three types of indexes can be defined for both vertex and edge labels: materialized view, secondary, and search. See the section on indexing for a more thorough discussion of indexing.
Schema can be added and dropped after initial creation. Like any database, this feature is useful during development, but doing such manipulation during production can affect data continuity.