Using the DataFrames API

The Apache Spark™ DataFrames API encapsulates data sources, including DataStax Enterprise data, organized into named columns.

The Spark Cassandra Connector provides an integrated DataSource to simplify creating DataFrames. For more technical details, see the Spark Cassandra Connector documentation that is maintained by DataStax and the Cassandra and PySpark DataFrames post.

Examples of using the DataFrames API

This Python example shows using the DataFrames API to read from the table ks.kv and insert into a different table ks.othertable.

dse pyspark
table1 = spark.read.format("org.apache.spark.sql.cassandra")
  .options(table="kv", keyspace="ks")
  .load()
table1.write.format("org.apache.spark.sql.cassandra")
  .options(table="othertable", keyspace = "ks")
  .save(mode ="append")

Using the DSE Spark console, the following Scala example shows how to create a DataFrame object from one table and save it to another.

dse spark
val table1 = spark.read.format("org.apache.spark.sql.cassandra")
  .options(Map( "table" -> "words", "keyspace" -> "test"))
  .load()
table1.createCassandraTable("test", "otherwords", partitionKeyColumns = Some(Seq("word")), clusteringKeyColumns = Some(Seq("count")))
table1.write.cassandraFormat("otherwords", "test").save()

The write operation uses one of the helper methods, cassandraFormat, included in the Spark Cassandra Connector. This is a simplified way of setting the format and options for a standard DataFrame operation. The following command is equivalent to write operation using cassandraFormat:

table1.write.format("org.apache.spark.sql.cassandra")
  .options(Map("table" -> "othertable", "keyspace" -> "test"))
  .save()

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com