Similarities and differences

Compare main features of DSE Graph and relational databases.

Similarities and differences.

Graph databases and relational databases can both store data. Both types of database are queryable, so filtered results can be retrieved from the data stored. How the data is stored and queried is quite different.

A relational database management system (RDBMS) stores data in tables. A key is defined for data retrieval. If there is a relationship between data in two different tables, a foreign key is defined to link the tables. Data is normalized, meaning the data is stored so that no duplication occurs. In order to write complex queries for a relational database, several tables must be accessed and the retrieved data joined.

A graph database also stores data in tables. DSE Graph stores the data in Cassandra tables. In graph databases, the relationship between data is embedded in the data model. The data storage of a graph database can be likened to a pre-joined RDBMS, with the built-in data relationships. Data is retrieved by traversing the graph, rather than creating time-consuming and error-prone JOINS in Structured Query Language (SQL).

An added bonus for graph databases is the ability to scale out the distribution of the database. A graph database query will start with a particular vertex and filter the graph based on the query requirements. Because the retrieval generally involves filtering subset(s) of the entire graph, the graph can be partitioned. The query can operate on either a single or small number of nodes in the DSE cluster, in parallel for some steps of the traversal. Relational databases are hard to scale out and distribute, because of the interconnectedness of the tables and the requirement to access many tables to join data for successful queries.

Indexing techniques between graph databases and relational databases are quite distinct and can affect performance markedly. An example illustrates the differences. Consider the query for looking up all buyers that purchased an XBox One. In a relational database, three tables would be accessed to answer the query:

Three index lookups are required to access the data for the query:

each buyer in the buyers table
the item being queries in the items table
the buyer and item in the buyers-items join table.

In contrast, a graph database requires only one index lookup for the item, and then traverses the graph to each buyer vertex to complete the query:

The graph database query will run in close to constant time. The relational database query, if a B-tree index is assumed, will run in O(log N) time where N is the number of records in the table for each foreign key that must be referenced. Even for a one-deep query this will result in a large difference in performance; if a query goes to a more deeply nested depth, the performance gap will increase.

Another factor to consider in comparing DSE Graph and SQL-based databases is the ease of writing complex queries. Consider the query required for a recommendation engine about products, and the intuitive nature of the Gremlin language is evident.

Although many developers are familiar with SQL, this example shows that the complexity of a query can become unwieldy, even for experts. The Gremlin query by contrast can be read from left to right and each step follows from the previous step.