Using the DataFrames API

The Apache Spark™ DataFrames API encapsulates data sources, including DataStax Enterprise data, organized into named columns.

The Cassandra Spark connector provides an integrated DataSource to simplify creating DataFrames. For more technical details, see Apache Cassandra® Spark connector data frames documentation and Cassandra and PySpark DataFrames Revisited.

Examples of using the DataFrames API

This Python example shows using the DataFrames API to read from the table ks.kv and insert into a different table ks.othertable.

dse pyspark

table1 = spark.read.format("org.apache.spark.sql.cassandra")
  .options(table="kv", keyspace="ks")
  .load()
table1.write.format("org.apache.spark.sql.cassandra")
  .options(table="othertable", keyspace = "ks")
  .save(mode ="append")

Using the DSE Spark console, the following Scala example shows how to create a DataFrame object from one table and save it to another.

dse spark

val table1 = spark.read.format("org.apache.spark.sql.cassandra")
  .options(Map( "table" -> "words", "keyspace" -> "test"))
  .load()
table1.createCassandraTable("test", "otherwords", partitionKeyColumns = Some(Seq("word")), clusteringKeyColumns = Some(Seq("count")))
table1.write.cassandraFormat("otherwords", "test").save()

The write operation uses one of the helper methods, cassandraFormat, included in the Cassandra Spark connector. This is a simplified way of setting the format and options for a standard DataFrame operation. The following command is equivalent to write operation using cassandraFormat:

table1.write.format("org.apache.spark.sql.cassandra")
  .options(Map("table" -> "othertable", "keyspace" -> "test"))
  .save()

Using the DataFrames API

Examples of using the DataFrames API

Was this helpful?

Give Feedback