Querying DSE Graph vertices and edges with Spark SQL

Apache Spark™ SQL can query DSE Graph vertex and edge tables. The dse_graph database holds the vertex and edge tables for each graph. The naming format for the tables is <graph name>_vertices and <graph name>_edges. For example, if you have a graph named gods, the vertices and edges are accessible in Spark SQL in the dse_graph.gods_vertices and dse_graph.gods_edges tables.

select * from dse_graph.gods_vertices;

If you have properties that are spelled the same but with different capitalizations (for example, id and Id), start Spark SQL with the --conf spark.sql.caseSensitive=true option.

Start your cluster with both Graph and Apache Spark enabled.
Start the Spark SQL shell.
```
dse spark-sql
```

Query the vertices and edges using SELECT statements.

USE dse_graph;
SELECT * FROM gods_vertices where name = 'Zeus';

Join the vertices and edges in a query.

Vertices are identified by id columns. Edge tables have src and dst columns that identify the from and to vertices, respectively. A join can be used to traverse the graph. For example to find all vertex ids that are reached by the out edges:
```
SELECT gods_edges.dst FROM gods_vertices JOIN gods_edges ON gods_vertices.id = gods_edges.src;
```

The same steps work from the Spark shell using spark.sql() to run the query statements, or using the JDBC/ODBC driver and the Spark SQL Thrift Server.

Querying DSE Graph vertices and edges with Spark SQL

Was this helpful?

Give Feedback