Accessing database data from Spark

DataStax Enterprise integrates Spark with DataStax Enterprise database. Database tables are fully usable from Spark.

Accessing the database from a Spark application

To access the database from a Spark application, follow instructions in the Spark example Portfolio Manager demo using Spark.

Accessing database data from the Spark shell

DataStax Enterprise uses the Spark Cassandra Connector to provide database integration for Spark. By running the Spark shell in DataStax Enterprise, you have access to enriched Spark context objects for accessing transactional nodes directly. See the Spark Cassandra Connector Java Doc on GitHub.

To access database data from the Spark Shell, just run the dse spark command and follow instructions in subsequent sections.

dse spark
Creating a new Spark Session
Spark context Web UI available at http://10.0.0.1:4041
Spark Context available as 'sc' (master = dse://?, app id = app-20200221215444-0000).
Spark Session available as 'spark'.
Spark SqlContext (Deprecated use Spark Session instead) available as 'sqlContext'
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.4.0.11
      /_/

Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_232)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

The Spark Shell creates a default Spark session named spark, an instance of org.apache.spark.sql.SparkSession.

The Spark Shell creates two contexts by default: sc (an instance of org.apache.spark.SparkContext) and sqlContext (an instance of org.apache.spark.sql.hive.HiveContext).

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com