Accessing Cassandra from Spark

DataStax Enterprise integrates Spark with Cassandra. Cassandra tables are fully usable from Spark.

Accessing Cassandra from a Spark application

To access Cassandra from a Spark application, follow instructions in the Spark example Portfolio Manager demo using Spark.

Accessing Cassandra from the Spark shell

DataStax Enterprise uses the Spark Cassandra Connector to provide Cassandra integration for Spark. By running the Spark shell in DataStax Enterprise, you have access to enriched Spark context objects for accessing Cassandra directly. See the Spark Cassandra Connector Java Doc on GitHub.

To access Cassandra from the Spark Shell, just run the dse spark command and follow instructions in subsequent sections.

$ dse spark

Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.2
      /_/

Using Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.8.0_91)
Type in expressions to have them evaluated.
Type :help for more information.
Creating SparkContext...
Created spark context..
Spark context available as sc.
Hive context available as sqlContext. Will be initialized on first use.

scala>

The Spark Shell creates two contexts by default: sc (an instance of org.apache.spark.SparkContext) and sqlContext (an instance of org.apache.spark.sql.hive.HiveContext).

Note: In previous versions of DSE, the default HiveContext instance was named hc. If your application uses hc instead of sqlContext, you can work around this change by adding a line:

val hc=sqlContext

Previous versions also created a CassandraSqlContext instance named csc. Starting in DSE 5.0, this is no longer the case. Use the sqlContext object instead.