Basic graph data modeling

Basics of graph data modeling.

To get started with graph database concepts, let's explore the world of food as a graph:
Figure 1:
This graph, like many property graphs, is comprised of several subgraphs. For instance, one set of vertices and edges apply to food tracking by a person, while another set of vertices and edges apply to the distribution of ingredients amongst homes and stores. A recipe database is embedded within the data model. Links to recipes in cookbooks extend the model to allow for applications that can recommend cookbooks based on recipe choices. Another subgraph connects people to the cookbooks they author.
Figure 2:
Examine the basic unit of a graph, a single edge connecting two vertices of different types. For this data model, a person (vertex label) created (edge) a recipe (vertex label). The properties are name, used for both a person and a recipe, and create_date used as an edge property for created.
Figure 3:
A generalized data model represents instances of each of the elements. Julia Child was a famous chef who created many recipes. One of the recipes she created for an American audience in 1961 was beef bourguignon. The diagram below captures the essence of this information with two vertices, one edge and the vertex property name and edge property create_date.
Figure 4: Julia Child creates beef bourguignon

DataStax Graph (DSG) supports multiplicity for both properties and edges. Multiple values for the same property can be defined with collections (set, list, map), tuples, or user-defined types (UDTs). All properties use CQL data types. Multiple edges of the same type can be constructed between two vertices if uniqueness is guaranteed by a clustering key.

Looking at the full data model, a person vertex can have a name, as well as additional properties such as gender and nickname. A reviewed edge can have a multiple properties that identify attributes of a recipe review for the adjoining recipe. Or consider the locations that a person has lived during their lifetime; a query can be aimed at discovering where a person lived. Would it be interesting to know if Julia Child lived in France or the United States while writing her first cookbook? It could be relevant if the cookbook is on French cuisine. The tuple country includes the country name, start_date, and end_date that can support that query. Because Julia Child lived in multiple countries during her lifetime, the property country must be a data type that can store multiple countries with respective start and end dates.

You may wonder about deciding which entities and relationships are included in a graph data model. Let's take a look in the next section.