Basic graph data modeling

Basics of graph data modeling.

To get started with graph database concepts, let’s explore the world of food as a graph:

foodGraph

This graph, like many property graphs, is comprised of several subgraphs. For instance, one set of vertices and edges apply to food tracking by a person, while another set of vertices and edges apply to the distribution of ingredients amongst homes and stores. A recipe database is embedded within the data model. Links to recipes in cookbooks extend the model to allow for applications that can recommend cookbooks based on recipe choices. Another subgraph connects people to the cookbooks they author.

dataModelExampleSubgraphs

Examine the basic unit of a graph, a single edge connecting two vertices of different types. For this data model, a person (vertex label) created (edge) a recipe (vertex label). The properties are name, used for both a person and a recipe, and create_date used as an edge property for created.

dataModelExample2

A generalized data model represents instances of each of the elements. Julia Child was a famous chef who created many recipes. One of the recipes she created for an American audience in 1961 was beef bourguignon. The diagram below captures the essence of this information with two vertices, one edge and the vertex property name and edge property create_date.

dataModelIntro1

DataStax Graph (DSG) supports multiplicity for both properties and edges. Multiple values for the same property can be defined with collections (set, list, map), tuples, or user-defined types (UDTs). All properties use CQL data types. Multiple edges of the same type can be constructed between two vertices if uniqueness is guaranteed by a clustering key.

Looking at the full data model, a person vertex can have a name, as well as additional properties such as gender and nickname. A reviewed edge can have a multiple properties that identify attributes of a recipe review for the adjoining recipe. Or consider the locations that a person has lived during their lifetime; a query can be aimed at discovering where a person lived. Would it be interesting to know if Julia Child lived in France or the United States while writing her first cookbook? It could be relevant if the cookbook is on French cuisine. The tuple country includes the country name, start_date, and end_date that can support that query. Because Julia Child lived in multiple countries during her lifetime, the property country must be a data type that can store multiple countries with respective start and end dates.

You may wonder about deciding which entities and relationships are included in a graph data model. Let’s take a look in the next section.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com