Querying DSE Graph vertices and edges with Spark SQL

Spark SQL can query DSE Graph vertex and edge tables.

Spark SQL can query DSE Graph vertex and edge tables. The dse_graph database holds the vertex and edge tables for each graph. The naming format for the tables is graph name_vertices and graph name_edges. For example, if you have a graph named gods, the vertices and edges are accessible in Spark SQL in the dse_graph.gods_vertices and dse_graph.gods_edges tables.
select * from dse_graph.gods_vertices;
If you have properties that are spelled the same but with different capitalizations (for example, id and Id), start Spark SQL with the --conf spark.sql.caseSensitive=true option.

Prerequisites

Start your cluster with both Graph and Spark enabled.

Procedure

  1. Start the Spark SQL shell.
    dse spark-sql
  2. Query the vertices and edges using SELECT statements.
    USE dse_graph;
    SELECT * FROM gods_vertices where name = 'Zeus';
  3. Join the vertices and edges in a query.

    Vertices are identified by id columns. Edge tables have src and dst columns that identify the from and to vertices, respectively. A join can be used to traverse the graph. For example to find all vertex ids that are reached by the out edges:

    SELECT gods_edges.dst FROM gods_vertices JOIN gods_edges ON gods_vertices.id = gods_edges.src;

What's next

The same steps work from the Spark shell using spark.sql() to run the query statements, or using the JDBC/ODBC driver and the Spark SQL Thrift Server.