Access database data from Apache Spark
DataStax Enterprise (DSE) integrates Apache Spark™ with the DSE database. Database tables are fully usable from Apache Spark.
Access the database from a Spark application
To access the database from a Spark application, follow instructions in the Spark example Portfolio Manager demo using Spark.
Access database data from the Spark shell
DSE uses the Spark Cassandra Connector to provide database integration for Apache Spark. By running the Spark shell in DSE, you have access to enriched Spark context objects for accessing transactional nodes directly. See the Spark Cassandra Connector Java Doc on GitHub.
To access database data from the Spark Shell, just run the dse spark
command and follow instructions in subsequent sections.
dse spark
Creating a new Spark Session
Spark context Web UI available at
Spark Context available as 'sc' (master = dse://?, app id = app-20200221215444-0000).
Spark Session available as 'spark'.
Spark SqlContext (Deprecated use Spark Session instead) available as 'sqlContext'
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version
Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_232)
Type in expressions to have them evaluated.
Type :help for more information.
The Spark Shell creates a default Spark session named spark
, an instance of org.apache.spark.sql.SparkSession.
The Spark Shell creates two contexts by default: sc (an instance of org.apache.spark.SparkContext) and sqlContext (an instance of org.apache.spark.sql.hive.HiveContext).