Querying DSE Graph vertices and edges with Apache Spark™ SQL

Spark SQL can query DSE Graph vertex and edge tables. The com.datastax.bdp.graph.spark.sql.vertex and com.datastax.bdp.graph.spark.sql.edge data sources are used to specify vertex and edge tables in Spark SQL.

spark-sql> CREATE DATABASE graph_example;
spark-sql> USE graph_example;
spark-sql> CREATE TABLE vertices USING com.datastax.bdp.graph.spark.sql.vertex OPTIONS (graph 'example');
spark-sql> CREATE TABLE edges USING com.datastax.bdp.graph.spark.sql.edge OPTIONS (graph 'example');

If you have properties that are spelled the same but with different capitalizations (for example, id and Id), start Spark SQL with the --conf spark.sql.caseSensitive=true option.

Prerequisites

Start your cluster with both Graph and Spark enabled.

Procedure

Start the Spark SQL shell.
```
dse spark-sql
```

CREATE DATABASE graph_gods;
USE graph_gods;
CREATE TABLE vertices USING com.datastax.bdp.graph.spark.sql.vertex OPTIONS (graph 'gods');
CREATE TABLE edges USING com.datastax.bdp.graph.spark.sql.edge OPTIONS (graph 'gods');

Query the vertices and edges using SELECT statements.
```
SELECT * FROM vertices where name = 'Zeus';
```
Join the vertices and edges in a query.

Vertices are identified by id columns. Edges tables have src and dst columns that identify the from and to vertices, respectively. A join can be used to traverse the graph. For example to find all vertex ids that are reached by the out edges:
```
SELECT edges.dst FROM vertices JOIN edges ON vertices.id = edges.src;
```

What’s next

The same steps work from the Spark shell using spark.sql() to run the query statements, or using the JDBC/ODBC driver and the Spark SQL Thrift Server.

Querying DSE Graph vertices and edges with Apache Spark™ SQL

Prerequisites

Procedure

What’s next

Was this helpful?

Give Feedback