• Glossary
  • Support
  • Downloads
  • DataStax Home
Get Live Help
Expand All
Collapse All

DataStax Astra DB Classic Documentation

    • Overview
      • Release notes
      • Astra DB FAQs
      • Astra DB glossary
      • Get support
    • Getting Started
      • Grant a user access
      • Load and retrieve data
        • Use DSBulk to load data
        • Use Data Loader in Astra Portal
      • Connect a driver
      • Build sample apps
      • Use integrations
        • Connect with DataGrip
        • Connect with DBSchema
        • Connect with JanusGraph
        • Connect with Strapi
    • Planning
      • Plan options
      • Database regions
    • Securing
      • Security highlights
      • Security guidelines
      • Default user permissions
      • Change your password
      • Reset your password
      • Authentication and Authorization
      • Astra DB Plugin for HashiCorp Vault
    • Connecting
      • Connecting to a VPC
      • Connecting Change Data Capture (CDC)
      • Connecting CQL console
      • Connect the Spark Cassandra Connector to Astra
      • Drivers for Astra DB
        • Connecting C++ driver
        • Connecting C# driver
        • Connecting Java driver
        • Connecting Node.js driver
        • Connecting Python driver
        • Drivers retry policies
      • Connecting Legacy drivers
      • Get Secure Connect Bundle
    • Migrating
      • FAQs
      • Preliminary steps
        • Feasibility checks
        • Deployment and infrastructure considerations
        • Create target environment for migration
        • Understand rollback options
      • Phase 1: Deploy ZDM Proxy and connect client applications
        • Set up the ZDM Automation with ZDM Utility
        • Deploy the ZDM Proxy and monitoring
          • Configure Transport Layer Security
        • Connect client applications to ZDM Proxy
        • Manage your ZDM Proxy instances
      • Phase 2: Migrate and validate data
      • Phase 3: Enable asynchronous dual reads
      • Phase 4: Change read routing to Target
      • Phase 5: Connect client applications directly to Target
      • Troubleshooting
        • Troubleshooting tips
        • Troubleshooting scenarios
      • Additional resources
        • Glossary
        • Contribution guidelines
        • Release Notes
    • Managing
      • Managing your organization
        • User permissions
        • Pricing and billing
        • Audit Logs
        • Configuring SSO
          • Configure SSO for Microsoft Azure AD
          • Configure SSO for Okta
          • Configure SSO for OneLogin
      • Managing your database
        • Create your database
        • View your databases
        • Database statuses
        • Use DSBulk to load data
        • Use Data Loader in Astra Portal
        • Monitor your databases
        • Manage multiple keyspaces
        • Using multiple regions
        • Terminate your database
        • Resize your classic database
        • Park your classic database
        • Unpark your classic database
      • Managing with DevOps API
        • Managing database lifecycle
        • Managing roles
        • Managing users
        • Managing tokens
        • Managing multiple regions
        • Get private endpoints
        • AWS PrivateLink
        • Azure PrivateLink
        • GCP Private Service
    • Astra CLI
    • Developing with Stargate APIs
      • Develop with REST
      • Develop with Document
      • Develop with GraphQL
        • Develop with GraphQL (CQL-first)
        • Develop with GraphQL (Schema-first)
      • Develop with gRPC
        • gRPC Rust client
        • gRPC Go client
        • gRPC Node.js client
        • gRPC Java client
      • Develop with CQL
      • Tooling Resources
      • Node.js Document API client
      • Node.js REST API client
    • Stargate QuickStarts
      • Document API QuickStart
      • REST API QuickStart
      • GraphQL API CQL-first QuickStart
    • API References
      • DevOps REST API v2
      • Stargate Document API v2
      • Stargate REST API v2
  • DataStax Astra DB Classic Documentation
  • Connecting
  • Connect the Spark Cassandra Connector to Astra

Using Apache Spark to connect your database

Use Apache Spark to connect to your database and begin accessing your Astra DB tables using Scala in spark-shell. Connect Spark to Astra DB, run SQL statements, interact with Spark DataFrames/RDDs, or even run CQL statements directly.

Prerequisites

  1. Click Download Bundle at the top of any language (Connect using a driver or Spark under Integrate with other tools) page for connection credentials to your database. For more, see Downloading secure connect bundle.

    Download bundle
  2. Download Apache Spark pre-built for Apache Hadoop 2.7.

  3. Create a Application Token with the appropriate role set (RO User is needed for example below).

  4. Download the Spark Cassandra Connector (SCC) that matches your Apache Spark and Scala version from the maven central repository. To find the right version of SCC, check the SCC compatibility.

Procedure

Use the following steps if you are using Apache Spark in local mode.
  1. Expand the downloaded Apache Spark package into a directory and assign the directory name to \$SPARK_HOME (cd \$SPARK_HOME).

  2. Append the following lines at the end of a file, $SPARK_HOME/conf/spark-defaults.conf. If necessary, look for a template under the $SPARK_HOME/conf directory.

  3. Replace the second column (value) with the first four lines:

spark.files $SECURE_CONNECT_BUNDLE_FILE_PATH/secure-connect-{{safeName}}.zip
spark.cassandra.connection.config.cloud.path secure-connect-{{safeName}}.zip
spark.cassandra.auth.username <<CLIENT ID>>
spark.cassandra.auth.password <<CLIENT SECRET>>
spark.dse.continuousPagingEnabled false
  1. Launch spark-shell and enter the following scala commands:

import com.datastax.spark.connector._
import org.apache.spark.sql.cassandra._
spark.read.cassandraFormat("tables", "system_schema").load().count()

The following output appears:

$ bin/spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1608781805157).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.1
      /_/

Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.9.1)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import com.datastax.spark.connector._
import com.datastax.spark.connector._

scala> import org.apache.spark.sql.cassandra._
import org.apache.spark.sql.cassandra._

scala> spark.read.cassandraFormat("tables", "system_schema").load().count()
res0: Long = 25

scala> :quit

The Spark Cassandra Connector (SCC) is available for any Cassandra user, including Astra users. The SCC allows for better support of container orchestration platforms. For more, read Advanced Apache Cassandra Analytics Now Open for All.

Connecting CQL console Drivers for Astra DB

General Inquiries: +1 (650) 389-6000 info@datastax.com

© DataStax | Privacy policy | Terms of use

DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Kubernetes is the registered trademark of the Linux Foundation.

landing_page landingpage