Using the Spark session

The Spark session object is the primary entry point for Spark applications, and allows you to run SQL queries on database tables.

A Spark session is encapsulated in an instance of org.apache.spark.sql.SparkSession. The session object has information about the Spark Master, the Spark application, and the configuration options.

The DSE Spark shell automatically configures and creates a Spark session object named spark. Use this object to begin querying database tables in DataStax Enterprise.

spark.sql("SELECT * FROM keyspace.table_name")

Note:

In Spark 1.6 and earlier, there were separate HiveContext and SQLContext objects. Starting in Spark 2.0, the SparkSession encapsulates both.

Spark applications can use multiple sessions to use different underlying data catalogs. You can use an existing Spark session to create a new session by calling the newSession method.

val newSpark = spark.newSession()

Building a Spark session using the Builder API

The Builder API allows you to create a Spark session manually.

import org.apache.spark.sql.SparkSession
val sparkSession = SparkSession.builder
  .master("dse://localhost?")
  .appName("my-spark-app")
  .enableHiveSupport()
  .config("spark.executor.logs.rolling.maxRetainedFiles", "3")
  .config("spark.executor.logs.rolling.strategy", "size")
  .config("spark.executor.logs.rolling.maxSize", "50000")
  .getOrCreate

Stopping a Spark session

Use the stop method to end the Spark session.

spark.stop

Getting and setting configuration options

Use the spark.conf.get and spark.conf.set methods to retrieve or set Spark configuration options for the session.

spark.conf.set("spark.executor.logs.rolling.maxRetainedFiles", "3")
spark.conf.get("spark.executor.logs.rolling.maxSize")