Use Spark SQL in DataStax Studio

Analyze data stored in DSE clusters with Spark SQL relational queries. Spark SQL is a unified relational query language for traversing over distributed collections of data, and supports a variation of the SQL language used in relational databases.

Spark SQL notebook includes the following features.

Interactively perform Spark SQL queries against a DSE cluster
Schema-aware content assist
Syntax validations to facilitate faster prototyping

To run Spark SQL queries in Studio:

The DSE cluster must be configured for the AlwaysOn SQL service.
Be familiar with the supported syntax of Spark SQL.
In DataStax Studio, the Spark SQL statements must end in a semicolon (;).

Studio users require appropriate permissions to run Spark SQL queries. See Use authentication with AlwaysOn SQL.

If the AlwaysOn SQL service is turned on, Studio uses the JDBC interface to pass queries to DSE Analytics. Two tables, graphName_vertices and graphName_edges, are automatically generated in the Spark database dse_graph for each graph, where graphName is replaced with the graph used for the Studio connection assigned to a Studio notebook. These tables can be queried with common Spark SQL commands directly in Studio, or can be explored with the dse spark-sql shell. To learn more, see the Use Apache Spark SQL to query data documentation.

The notebook tutorial Working with SparkSQL is installed with Studio. The tutorial provides hands-on steps to create data and execute Spark SQL code in a notebook. Learn about exploring the SQL schema in schema view using content assist for syntax and domain validation. View results in table view and different styles of charts.

Use Spark SQL in DataStax Studio

Was this helpful?

Give Feedback