com.datastax.bdp.spark.writer.BulkTableWriter
Writes RDD data to sstables in a local temp directory and then streams the sstables to the Cassandra cluster.
Writes RDD data to sstables in a local temp directory and then streams the sstables to the Cassandra cluster. The keyspace and table must exist.
Depending on the setup this method may or may not be faster than standard saveToCassandra
call.
Compared to saveToCassandra
call, this method does more work on the client-side. Therefore it
uses more memory and I/O on the client, however it puts less stress on the server-side.
Use bulk saving if you experience timeouts or server-side OOMs when using saveToCassandra
method.
Make sure your Spark partitions are at least several tens of MBs large,
because bulkSaveToCassandra
will generate at least
one sstable per Spark partition.
Import
BulkTableWriter._
to enhance your RDDs with bulk saving capability.