Querying DSE Graph vertices and edges with Spark SQL
Spark SQL can query DSE Graph vertex and edge tables.
The dse_graph database holds the vertex and edge tables for each graph.
The naming format for the tables is <graph name>_vertices
and <graph name>_edges
.
For example, if you have a graph named gods
, the vertices and edges are accessible in Spark SQL in the dse_graph.gods_vertices
and dse_graph.gods_edges
tables.
select * from dse_graph.gods_vertices;
If you have properties that are spelled the same but with different capitalizations (for example, id
and Id
), start Spark SQL with the --conf spark.sql.caseSensitive=true
option.
Prerequisites
Start your cluster with both Graph and Spark enabled.
Procedure
-
Start the Spark SQL shell.
dse spark-sql
-
Query the vertices and edges using
SELECT
statements.USE dse_graph; SELECT * FROM gods_vertices where name = 'Zeus';
-
Join the vertices and edges in a query.
Vertices are identified by
id
columns. Edge tables havesrc
anddst
columns that identify the from and to vertices, respectively. A join can be used to traverse the graph. For example to find all vertex ids that are reached by the out edges:SELECT gods_edges.dst FROM gods_vertices JOIN gods_edges ON gods_vertices.id = gods_edges.src;
What’s next
The same steps work from the Spark shell using spark.sql()
to run the query statements, or using the JDBC/ODBC driver and the Spark SQL Thrift Server.