Use the Apache Spark session

An Apache Spark™ session is encapsulated in an instance of org.apache.spark.sql.SparkSession. The session object has information about the Spark master, the Spark application, and the configuration options.

The DSE Spark shell automatically configures and creates a Spark session object named spark. Use this object to begin querying database tables in DataStax Enterprise (DSE).

spark.sql("SELECT * FROM keyspace.table_name")

Spark applications can use multiple sessions to use different underlying data catalogs. You can use an existing Spark session to create a new session by calling the newSession method.

val newSpark = spark.newSession()

Building an Apache Spark session using the Builder API

The Builder API allows you to create a Spark session manually.

import org.apache.spark.sql.SparkSession
val sparkSession = SparkSession.builder
  .master("dse://localhost?")
  .appName("my-spark-app")
  .enableHiveSupport()
  .config("spark.executor.logs.rolling.maxRetainedFiles", "3")
  .config("spark.executor.logs.rolling.strategy", "size")
  .config("spark.executor.logs.rolling.maxSize", "50000")
  .getOrCreate

Stopping an Apache Spark session

Use the stop method to end the Spark session.

spark.stop

Getting and setting configuration options

Use the spark.conf.get and spark.conf.set methods to retrieve or set Spark configuration options for the session.

spark.conf.set("spark.executor.logs.rolling.maxRetainedFiles", "3")
spark.conf.get("spark.executor.logs.rolling.maxSize")

Use the Apache Spark session

Building an Apache Spark session using the Builder API

Stopping an Apache Spark session

Getting and setting configuration options

Was this helpful?

Give Feedback