Using Spark SQL in DataStax Studio
Analyze data stored in DSE clusters with Spark SQL relational queries. Spark SQL is a unified relational query language for traversing over distributed collections of data, and supports a variation of the SQL language used in relational databases.
Spark SQL notebook includes the following features.
-
Interactively perform Spark SQL queries against a DSE cluster
-
Schema-aware content assist
-
Syntax validations to facilitate faster prototyping
To run Spark SQL queries in Studio:
-
The DSE cluster must be configured for the AlwaysOn SQL service.
-
Be familiar with the Supported syntax of Spark SQL.
-
In DataStax Studio, the Spark SQL statements must end in a semicolon (
;
).
Studio users require appropriate permissions to run Spark SQL queries. See Using authentication with AlwaysOn SQL. |
If the AlwaysOnSQL service is turned on, Studio uses the JDBC interface to pass queries to DSE Analytics. Two tables, graphName_vertices and graphName_edges, are automatically generated in the Spark database dse_graph for each graph, where graphName is replaced with the graph used for the Studio connection assigned to a Studio notebook. These tables can be queried with common Spark SQL commands directly in Studio, or can be explored with the dse spark-sql shell. To learn more, see the Using Spark SQL to query data documentation.
The notebook tutorial Working with SparkSQL is installed with Studio. The tutorial provides hands-on steps to create data and execute Spark SQL code in a notebook. Learn about exploring the SQL schema in schema view using content assist for syntax and domain validation. View results in table view and different styles of charts.