Use Spark SQL in DataStax Studio

You can analyze data stored in DSE clusters with Spark SQL relational queries.

Spark SQL is a unified relational query language for traversing over distributed collections of data, and supports a variation of the SQL language used in relational databases.

Prerequisites

AlwaysOn SQL

If the AlwaysOn SQL service is turned on, Studio uses the JDBC interface to pass queries to DSE Analytics.

Two tables, graphName_vertices and graphName_edges, are automatically generated in the Spark database dse_graph for each graph, where graphName is replaced with the graph used for the Studio connection assigned to a Studio notebook. These tables can be queried with common Spark SQL commands directly in Studio, or can be explored with the dse spark-sql shell. For more information, see Use Apache Spark SQL to query data.

Spark SQL tutorial notebook

The Working with Spark SQL v6.0.0 tutorial notebook introduces techniques and features for working with Spark SQL in Studio:

  • Interactively perform Spark SQL queries against a DSE cluster

  • Use schema-aware content assist when writing Spark SQL statements.

  • Use syntax validations for faster prototyping.

  • Use schema view for a tree view of schema elements in a database.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax, an IBM Company | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com