Querying DSE Graph vertices and edges with Spark SQL
Spark SQL can query DSE Graph vertex and edge tables.
The com.datastax.bdp.graph.spark.sql.vertex and com.datastax.bdp.graph.spark.sql.edge data sources are used to specify vertex and edge tables in Spark SQL.
spark-sql> CREATE DATABASE graph_example;
spark-sql> USE graph_example;
spark-sql> CREATE TABLE vertices USING com.datastax.bdp.graph.spark.sql.vertex OPTIONS (graph 'example');
spark-sql> CREATE TABLE edges USING com.datastax.bdp.graph.spark.sql.edge OPTIONS (graph 'example');
If you have properties that are spelled the same but with different capitalizations (for example, id and Id), start Spark SQL with the --conf spark.sql.caseSensitive=true option.
Prerequisites
Start your cluster with both Graph and Spark enabled.
Procedure
-
Start the Spark SQL shell.
dse spark-sql -
Register the vertex and edge tables for your graph using
CREATE TABLE.CREATE DATABASE graph_gods; USE graph_gods; CREATE TABLE vertices USING com.datastax.bdp.graph.spark.sql.vertex OPTIONS (graph 'gods'); CREATE TABLE edges USING com.datastax.bdp.graph.spark.sql.edge OPTIONS (graph 'gods');
-
Query the vertices and edges using
SELECTstatements.SELECT * FROM vertices where name = 'Zeus'; -
Join the vertices and edges in a query.
Vertices are identified by
idcolumns. Edges tables havesrcanddstcolumns that identify the from and to vertices, respectively. A join can be used to traverse the graph. For example to find all vertex ids that are reached by the out edges:SELECT edges.dst FROM vertices JOIN edges ON vertices.id = edges.src;
What’s next
The same steps work from the Spark shell using spark.sql() to run the query statements, or using the JDBC/ODBC driver and the Spark SQL Thrift Server.