Querying database data using Apache Spark™ SQL in Scala

When you start Spark, DataStax Enterprise creates a Spark session instance to allow you to run Spark SQL queries against database tables.

When you start Spark, DataStax Enterprise creates a Spark session instance to allow you to run Spark SQL queries against database tables. The session object is named spark and is an instance of org.apache.spark.sql.SparkSession. Use the sql method to execute the query.

Procedure

  1. Start the Spark shell.

    dse spark
  2. Use the sql method to pass in the query, storing the result in a variable.

    val results = spark.sql("SELECT * from my_keyspace_name.my_table")
  3. Use the returned data.

    results.show()
    +--------------------+-----------+
    |                  id|description|
    +--------------------+-----------+
    |de2d0de1-4d70-11e...|      thing|
    |db7e4191-4d70-11e...|    another|
    |d576ad50-4d70-11e...|yet another|
    +--------------------+-----------+

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com