Get started with the Apache Cassandra Spark Connector

You can use the comprehensive processing features in Apache Spark™ to boost your data analysis and processing capabilities with DataStax Enterprise (DSE).

Apache Spark with Scala in spark-shell seamlessly integrates with tables in your DSE databases for advanced data analysis. You can run SQL and CQL queries to interact with your data. You can also use Spark DataFrames and RDDs for sophisticated data manipulation and analysis.

DSE is compatible with the Apache Cassandra Spark Connector, which allows for better support of container orchestration platforms. The Spark Connector is also known as the Cassandra Scala driver.

Prerequisites

  • A running DSE cluster

Prepare packages and dependencies

This guide recommends the latest version of the Apache Cassandra Spark Connector. If you want to use a different version, you must use Spark, Java, and Scala versions compatible with your chosen connector version. For more information, see Spark Connector version compatibility.

  1. Download Apache Spark pre-built for Apache Hadoop® and Scala. DataStax recommends the latest version.

  2. Download the latest cassandra-spark-connector package Latest cassandra-spark-connector release on GitHub.

  3. Install Java version 8 or later, and then set it as the default Java version.

  4. Install Scala version 2.12 or 2.13.

Connect to DSE with Spark

  1. Extract the Apache Spark package into a directory.

    The following steps use SPARK_HOME as a placeholder for the path to your Spark directory.

  2. Add the following lines to the end of the spark-defaults.conf file located at SPARK_HOME/conf/spark-defaults.conf. If no such file exists, look for a template in the SPARK_HOME/conf directory.

    spark.cassandra.auth.username SUPERUSER_USERNAME
    spark.cassandra.auth.password SUPERUSER_PASSWORD
    spark.dse.continuousPagingEnabled false

    Replace SUPERUSER_USERNAME and SUPERUSER_PASSWORD with your DSE database’s superuser credentials.

  3. Launch spark-shell from the root directory of your Spark installation.

    Following version 3.5.1, the Apache Software Foundation (ASF) maintains the Spark Connector (cassandra-spark-connector). Prior versions were maintained by DataStax.

    For version 3.5.1 and earlier, the package groupId is com.datastax.spark. This could change in later releases.

    bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector_SCALA_VERSION:CONNECTOR_VERSION

    Replace the following:

    • SCALA_VERSION: Your Scala version.

    • CONNECTOR_VERSION: Your Spark Connector version.

    Result
    bin/spark-shell
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Spark context Web UI available at http://localhost:4040
    Spark context available as 'sc' (master = local[*], app id = local-1608781805157).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _\ \/ _ \/ _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_\   version 3.0.1
          /_/
    
    Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.9.1)
    Type in expressions to have them evaluated.
    Type :help for more information.
    
    scala>
  4. Run the following Scala commands to connect Spark to your database through the connector:

    import com.datastax.spark.connector._
    import org.apache.spark.sql.cassandra._
    spark.read.cassandraFormat("tables", "system_schema").load().count()
    Result
    scala> import com.datastax.spark.connector._
    import com.datastax.spark.connector._
    
    scala> import org.apache.spark.sql.cassandra._
    import org.apache.spark.sql.cassandra._
    
    scala> spark.read.cassandraFormat("tables", "system_schema").load().count()
    res0: Long = 25
    
    scala> :quit

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2026 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM