Use Spark SQL in DataStax Studio
You can analyze data stored in DSE clusters with Spark SQL relational queries.
Spark SQL is a unified relational query language for traversing over distributed collections of data, and supports a variation of the SQL language used in relational databases.
Prerequisites
-
The DSE cluster must be configured for the AlwaysOn SQL service.
-
Be familiar with the supported syntax of Spark SQL. If you are new to DSE Analytics, see About DSE Analytics.
-
In DataStax Studio, the Spark SQL statements must end in a semicolon (
;
). -
Studio users must have appropriate permissions to run Spark SQL queries. For more information, see Use authentication with AlwaysOn SQL.
AlwaysOn SQL
If the AlwaysOn SQL service is turned on, Studio uses the JDBC interface to pass queries to DSE Analytics.
Two tables, graphName_vertices and graphName_edges, are automatically generated in the Spark database dse_graph for each graph, where graphName is replaced with the graph used for the Studio connection assigned to a Studio notebook. These tables can be queried with common Spark SQL commands directly in Studio, or can be explored with the dse spark-sql shell. For more information, see Use Apache Spark SQL to query data.
Spark SQL tutorial notebook
The Working with Spark SQL v6.0.0 tutorial notebook introduces techniques and features for working with Spark SQL in Studio:
-
Interactively perform Spark SQL queries against a DSE cluster
-
Use schema-aware content assist when writing Spark SQL statements.
-
Use syntax validations for faster prototyping.
-
Use schema view for a tree view of schema elements in a database.