Use Apache Spark to connect your database
Use Apache Spark to connect to your database and begin accessing your Astra DB tables using Scala in spark-shell. Connect Spark to Astra DB, run SQL statements, interact with Spark DataFrames/RDDs, or even run CQL statements directly.
Prerequisites
-
Download Apache Spark pre-built for Apache Hadoop 2.7
-
Create an application token with, at minimum, a read-only role
-
Download the Spark Cassandra Connector (SCC) version that is compatible with your Apache Spark and Scala version
Procedure
Use the following steps if you are using Apache Spark in local mode:
-
Expand the downloaded Apache Spark package into a directory and assign the directory name to \$SPARK_HOME (cd \$SPARK_HOME).
-
Append the following lines at the end of a file, $SPARK_HOME/conf/spark-defaults.conf. If necessary, look for a template under the $SPARK_HOME/conf directory.
-
Replace the second column (value) with the first four lines:
spark.files $SECURE_CONNECT_BUNDLE_FILE_PATH/secure-connect-{{safeName}}.zip spark.cassandra.connection.config.cloud.path secure-connect-{{safeName}}.zip spark.cassandra.auth.username <<CLIENT ID>> spark.cassandra.auth.password <<CLIENT SECRET>> spark.dse.continuousPagingEnabled false
-
Launch spark-shell and enter the following scala commands:
import com.datastax.spark.connector._ import org.apache.spark.sql.cassandra._ spark.read.cassandraFormat("tables", "system_schema").load().count()
Response
$ bin/spark-shell Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Spark context Web UI available at http://localhost:4040 Spark context available as 'sc' (master = local[*], app id = local-1608781805157). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 3.0.1 /_/ Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.9.1) Type in expressions to have them evaluated. Type :help for more information. scala> import com.datastax.spark.connector._ import com.datastax.spark.connector._ scala> import org.apache.spark.sql.cassandra._ import org.apache.spark.sql.cassandra._ scala> spark.read.cassandraFormat("tables", "system_schema").load().count() res0: Long = 25 scala> :quit
The Spark Cassandra Connector (SCC) is available for any Apache Cassandra® user, including Astra DB users. The SCC allows for better support of container orchestration platforms. For more information, see Advanced Apache Cassandra Analytics Now Open for All.