Using the Spark session
The Spark session object is the primary entry point for Spark applications, and allows you to run SQL queries on database tables.
A Spark session is encapsulated in an instance of org.apache.spark.sql.SparkSession. The session object has information about the Spark Master, the Spark application, and the configuration options.
The DSE Spark shell automatically configures and creates a Spark session object named spark. Use this object to begin querying database tables in DataStax Enterprise.
spark.sql("SELECT * FROM keyspace.table_name")
In Spark 1.6 and earlier, there were separate HiveContext and SQLContext objects. Starting in Spark 2.0, the SparkSession encapsulates both.
val newSpark = spark.newSession()
Building a Spark session using the Builder API
import org.apache.spark.sql.SparkSession val sparkSession = SparkSession.builder .master("dse://localhost?") .appName("my-spark-app") .enableHiveSupport() .config("spark.executor.logs.rolling.maxRetainedFiles", "3") .config("spark.executor.logs.rolling.strategy", "size") .config("spark.executor.logs.rolling.maxSize", "50000") .getOrCreate
Stopping a Spark session
Use the stop method to end the Spark session.
spark.stop
Getting and setting configuration options
Use the spark.conf.get and spark.conf.set methods to retrieve or set Spark configuration options for the session.
spark.conf.set("spark.executor.logs.rolling.maxRetainedFiles", "3") spark.conf.get("spark.executor.logs.rolling.maxSize")