DSE Graph and Graph Analytics

DSE Graph allows you to perform OLAP queries using Spark.

Many local graph traversals can be executed in real time at high transactional loads. When the density of the graph is too high or the branching factor too large (the number of connected nodes at each level of the graph), the memory and computation requirements to answer OLTP queries go beyond what is acceptable under typical application workloads. These type of queries are called deep queries.

Scan queries are queries that touch either an entire graph or large parts of the graph. They typically traverse a large number of vertices and edges. For example, a query on a social network graph that searches for posts by users between 25 and 40 years old is a scan query.

For applications that use deep and scan queries, using a OLAP query will result in better performance.

Performing OLAP queries using DSE Graph

Every graph created in DSE Graph has an OLAP traversal source a that is available to gremlin-console and DataStax Studio. This traversal source uses the SparkGraphComputer to analyze queries and execute them against the underlying DSE Analytics nodes. The nodes must be started with Graph and Spark enabled to access the OLAP traversal source. For one-off or single-session OLAP queries, alias database.a to g and create the query. For example in the Gremlin console:

gremlin> :remote config alias g database.a 
gremlin> g.V().count()

If you are performing multiple queries against different parts of the graph, use graph.snapshot() to return an OLAP traversal source for each part of the graph. The returned OLAP traversal source is a persisted RDD. For example, in the Gremlin console:

gremlin> categories = graph.snapshot().vertices('category').create()

When to use analytic OLAP queries

On large graphs, OLAP queries typically perform better for deep queries. However, executing deep queries as part of an OLTP load may make sense if they are rarely performed. For example, on online payment provider will favor OLTP queries to process payments quickly, but may require a deep query if there are indications of fraud in the transaction. While the deep query may take much longer as an OLTP workload, on the whole the performance of the application will be faster than segmenting the application into OLTP and OLAP queries.

Long running and periodic processes like recommendation engines and search engines that analyze an entire graph are the ideal use cases for OLAP queries. However, one-off data analysis operations that involve deep queries or that scan the entire database also can benefit from being run as OLAP queries. See DSE Graph, OLTP and OLAP for detailed information on performance differences between OLTP and OLAP queries.