Connect to DSE with the Cassandra Spark connector

You can use the comprehensive processing features of Apache Spark™ to boost your data analysis and processing capabilities with DataStax Enterprise (DSE).

Apache Spark with Scala in spark-shell seamlessly connects to and accesses your DSE tables for advanced data analysis. You can directly execute SQL and CQL queries to interact with your data, and you can employ Spark DataFrames and RDDs for sophisticated data manipulation and analysis.

DSE is compatible with the Cassandra Spark connector, which allows for better support of container orchestration platforms.

Prerequisites

A running DSE database

Prepare packages and dependencies

This guide uses the latest version of the Cassandra Spark connector. If you want to use a different version, you must use Spark, Java, and Scala versions compatible with your chosen connector version. For more information, see Cassandra Spark connector version compatibility.

Download Apache Spark pre-built for Apache Hadoop and Scala. DataStax recommends the latest version.
Download the latest cassandra-spark-connector package.
Install Java version 8 or later, and then set it as the default Java version.
Install Scala version 2.12 or 2.13.

Connect to a DSE database with Apache Spark

Extract the Apache Spark package into a directory.

The following steps use SPARK_HOME as a placeholder for the path to your Apache Spark directory.
Add the following lines to the end of the spark-defaults.conf file located at SPARK_HOME/conf/spark-defaults.conf. If no such file exists, look for a template in the SPARK_HOME/conf directory.
```
spark.cassandra.auth.username SUPERUSER_USERNAME
spark.cassandra.auth.password SUPERUSER_PASSWORD
spark.dse.continuousPagingEnabled false
```
Replace SUPERUSER_USERNAME and SUPERUSER_PASSWORD with your DSE database’s superuser credentials.

Launch spark-shell from the root directory of your Spark installation:

$ bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector_SCALA_VERSION:CONNECTOR_VERSION

Replace SCALA_VERSION with your Scala version, and replace CONNECTOR_VERSION with your Cassandra Spark connector version.

Result

$ bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1608781805157).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.1
      /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.9.1)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

Run the following Scala commands to connect Spark with your database through the connector:

import com.datastax.spark.connector._
import org.apache.spark.sql.cassandra._
spark.read.cassandraFormat("tables", "system_schema").load().count()

Result

scala> import com.datastax.spark.connector._
import com.datastax.spark.connector._

scala> import org.apache.spark.sql.cassandra._
import org.apache.spark.sql.cassandra._

scala> spark.read.cassandraFormat("tables", "system_schema").load().count()
res0: Long = 25

scala> :quit

Next steps

To learn more about using the Cassandra Spark connector, see the Cassandra Spark connector documentation.

Connect to DSE with the Cassandra Spark connector

Prerequisites

Prepare packages and dependencies

Connect to a DSE database with Apache Spark

Next steps

Was this helpful?

Give Feedback