Using Spark SQL in DataStax Studio

Writing, testing, and running Spark SQL queries against DSE clusters.

Analyze data stored in DSE clusters with Spark SQL relational queries. Spark SQL is a unified relational query language for traversing over distributed collections of data, and supports a variation of the SQL language used in relational databases.

Spark SQL notebook features include:
  • Interactively perform Spark SQL queries against a DSE cluster
  • Schema-aware content assist
  • Syntax validations to facilitate faster prototyping
To run Spark SQL queries in Studio:If the AlwaysOnSQL service is turned on, Studio uses the JDBC interface to pass queries to DSE Analytics. Two tables, graphName_vertices and graphName_edges, are automatically generated in the Spark database dse_graph for each graph, where graphName is replaced with the graph used for the Studio connection assigned to a Studio notebook. These tables can be queried with common Spark SQL commands directly in Studio, or can be explored with the dse spark-sql shell. To learn more about using Spark SQL to query, see the Using Spark SQL to query data documentation.

The notebook tutorial Working with SparkSQL is installed with Studio. The tutorial provides hands-on steps to create data and execute Spark SQL code in a notebook. Learn about exploring the SQL schema in schema view, using content assist for syntax and domain validation. View results in table view and different styles of charts.