Connect to DSE with the Cassandra Spark connector
You can use the comprehensive processing features of Apache Spark™ to boost your data analysis and processing capabilities with DataStax Enterprise (DSE).
Apache Spark with Scala in spark-shell
seamlessly connects to and accesses your DSE tables for advanced data analysis.
You can directly execute SQL and CQL queries to interact with your data, and you can employ Spark DataFrames and RDDs for sophisticated data manipulation and analysis.
DSE is compatible with the Cassandra Spark connector, which allows for better support of container orchestration platforms.
Prerequisites
-
A running DSE database
Prepare packages and dependencies
This guide uses the latest version of the Cassandra Spark connector. If you want to use a different version, you must use Spark, Java, and Scala versions compatible with your chosen connector version. For more information, see Cassandra Spark connector version compatibility.
-
Download Apache Spark pre-built for Apache Hadoop and Scala. DataStax recommends the latest version.
-
Download the latest
cassandra-spark-connector
package.
-
Install Java version 8 or later, and then set it as the default Java version.
-
Install Scala version 2.12 or 2.13.
Connect to a DSE database with Apache Spark
-
Extract the Apache Spark package into a directory.
The following steps use
SPARK_HOME
as a placeholder for the path to your Apache Spark directory. -
Add the following lines to the end of the
spark-defaults.conf
file located atSPARK_HOME/conf/spark-defaults.conf
. If no such file exists, look for a template in theSPARK_HOME/conf
directory.spark.cassandra.auth.username SUPERUSER_USERNAME spark.cassandra.auth.password SUPERUSER_PASSWORD spark.dse.continuousPagingEnabled false
Replace
SUPERUSER_USERNAME
andSUPERUSER_PASSWORD
with your DSE database’s superuser credentials. -
Launch
spark-shell
from the root directory of your Spark installation:$ bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector_SCALA_VERSION:CONNECTOR_VERSION
Replace
SCALA_VERSION
with your Scala version, and replaceCONNECTOR_VERSION
with your Cassandra Spark connector version.Result
$ bin/spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1608781805157). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.0.1 /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.9.1) Type in expressions to have them evaluated. Type :help for more information. scala>
-
Run the following Scala commands to connect Spark with your database through the connector:
import com.datastax.spark.connector._ import org.apache.spark.sql.cassandra._ spark.read.cassandraFormat("tables", "system_schema").load().count()
Result
scala> import com.datastax.spark.connector._ import com.datastax.spark.connector._ scala> import org.apache.spark.sql.cassandra._ import org.apache.spark.sql.cassandra._ scala> spark.read.cassandraFormat("tables", "system_schema").load().count() res0: Long = 25 scala> :quit
Next steps
To learn more about using the Cassandra Spark connector, see the Cassandra Spark connector documentation.