Using the Spark session
A Spark session is encapsulated in an instance of org.apache.spark.sql.SparkSession
.
The session object has information about the Spark Master, the Spark application, and the configuration options.
The DSE Spark shell automatically configures and creates a Spark session object named spark. Use this object to begin querying database tables in DataStax Enterprise.
$ scala> spark.sql("SELECT * FROM keyspace.table_name")
In Spark 1.6 and earlier, there were separate HiveContext and SQLContext objects. Starting in Spark 2.0, the SparkSession encapsulates both. |
Spark applications can use multiple sessions to use different underlying data catalogs. You can use an existing Spark session to create a new session by calling the newSession method.
val newSpark = spark.newSession()
Building a Spark session using the Builder API
The Builder API allows you to create a Spark session manually.
import org.apache.spark.sql.SparkSession
val sparkSession = SparkSession.builder
.master("dse://localhost?")
.appName("my-spark-app")
.enableHiveSupport()
.config("spark.executor.logs.rolling.maxRetainedFiles", "3")
.config("spark.executor.logs.rolling.strategy", "size")
.config("spark.executor.logs.rolling.maxSize", "50000")
.getOrCreate
Stopping a Spark session
Use the stop method to end the Spark session.
spark.stop
Getting and setting configuration options
Use the spark.conf.get
and spark.conf.set
methods to retrieve or set Spark configuration options for the session.
spark.conf.set("spark.executor.logs.rolling.maxRetainedFiles", "3")
spark.conf.get("spark.executor.logs.rolling.maxSize")