Graph Analytics
DSE Graph can be used in conjunction with Apache Spark embedded in DSE to perform Online Analytical Processing (OLAP)
queries on graph datasets. OLAP can be enabled for graph queries by simply setting the graph_source
graph option
to ‘a’ or using the :default_graph_analytics
execution profile. Graph OLAP queries will always be routed to the
Spark master.
Background
- Given
- a running dse cluster with graph and spark enabled
- And
- an existing graph called “user_connections_spark” with schema:
schema.propertyKey('name').Text().ifNotExists().create(); schema.propertyKey('age').Int().ifNotExists().create(); schema.propertyKey('lang').Text().ifNotExists().create(); schema.propertyKey('weight').Float().ifNotExists().create(); schema.vertexLabel('person').properties('name', 'age').ifNotExists().create(); schema.vertexLabel('software').properties('name', 'lang').ifNotExists().create(); Vertex marko = graph.addVertex(label, 'person', 'name', 'marko', 'age', 29); Vertex vadas = graph.addVertex(label, 'person', 'name', 'vadas', 'age', 27); Vertex lop = graph.addVertex(label, 'software', 'name', 'lop', 'lang', 'java'); Vertex josh = graph.addVertex(label, 'person', 'name', 'josh', 'age', 32); Vertex ripple = graph.addVertex(label, 'software', 'name', 'ripple', 'lang', 'java'); Vertex peter = graph.addVertex(label, 'person', 'name', 'peter', 'age', 35);
Running an OLAP graph query
- Given
- the following example:
require 'dse' cluster = Dse.cluster(graph_name: 'user_connections_spark') session = cluster.connect results = session.execute_graph('g.V().count()', execution_profile: :default_graph_analytics) puts "Result: #{results.first.value}" puts "The spark master was: #{results.execution_info.hosts.last.ip}"
- When
- it is executed
- Then
- its output should contain:
Result: 6 The spark master was: 127.0.0.1