Using the Spark session

A Spark session is encapsulated in an instance of org.apache.spark.sql.SparkSession. The session object has information about the Spark Master, the Spark application, and the configuration options.

The DSE Spark shell automatically configures and creates a Spark session object named spark. Use this object to begin querying database tables in DataStax Enterprise.

spark.sql("SELECT * FROM keyspace.table_name")

Spark applications can use multiple sessions to use different underlying data catalogs. You can use an existing Spark session to create a new session by calling the newSession method.

val newSpark = spark.newSession()

Building a Spark session using the Builder API

The Builder API allows you to create a Spark session manually.

import org.apache.spark.sql.SparkSession
val sparkSession = SparkSession.builder
  .config("spark.executor.logs.rolling.maxRetainedFiles", "3")
  .config("spark.executor.logs.rolling.strategy", "size")
  .config("spark.executor.logs.rolling.maxSize", "50000")

Stopping a Spark session

Use the stop method to end the Spark session.


Getting and setting configuration options

Use the spark.conf.get and spark.conf.set methods to retrieve or set Spark configuration options for the session.

spark.conf.set("spark.executor.logs.rolling.maxRetainedFiles", "3")

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000,