Using Apache Spark to connect your database
Use Apache Spark to connect to your database and begin accessing your Astra DB tables using Scala in spark-shell. Connect Spark to Astra DB, run SQL statements, interact with Spark DataFrames/RDDs, or even run CQL statements directly.
Prerequisites
-
Click Download Bundle (Connect using a native driver or Spark under Integrate with other tools) for connection credentials to your database. For more, see Downloading secure connect bundle.
-
Download Apache Spark pre-built for Apache Hadoop 2.7.
-
Create an application token with the appropriate role set. The following example requires a read-only role.
-
Download the Spark Cassandra Connector (SCC) that matches your Apache Spark and Scala version from the maven central repository. To find the right version of SCC, check the SCC compatibility.
Procedure
Use the following steps if you are using Apache Spark in local mode. |
-
Expand the downloaded Apache Spark package into a directory and assign the directory name to \$SPARK_HOME (cd \$SPARK_HOME).
-
Append the following lines at the end of a file, $SPARK_HOME/conf/spark-defaults.conf. If necessary, look for a template under the $SPARK_HOME/conf directory.
-
Replace the second column (value) with the first four lines:
spark.files $SECURE_CONNECT_BUNDLE_FILE_PATH/secure-connect-{{safeName}}.zip
spark.cassandra.connection.config.cloud.path secure-connect-{{safeName}}.zip
spark.cassandra.auth.username <<CLIENT ID>>
spark.cassandra.auth.password <<CLIENT SECRET>>
spark.dse.continuousPagingEnabled false
-
Launch spark-shell and enter the following scala commands:
import com.datastax.spark.connector._
import org.apache.spark.sql.cassandra._
spark.read.cassandraFormat("tables", "system_schema").load().count()
The following output appears:
$ bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1608781805157).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.1
/_/
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.9.1)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import com.datastax.spark.connector._
import com.datastax.spark.connector._
scala> import org.apache.spark.sql.cassandra._
import org.apache.spark.sql.cassandra._
scala> spark.read.cassandraFormat("tables", "system_schema").load().count()
res0: Long = 25
scala> :quit
The Spark Cassandra Connector (SCC) is available for any Cassandra user, including Astra users. The SCC allows for better support of container orchestration platforms. For more, read Advanced Apache Cassandra Analytics Now Open for All.