RDDFunctions

Instance Constructors

new RDDFunctions(rdd: RDD[T])

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def deleteFromCassandra(keyspaceName: String, tableName: String, deleteColumns: ColumnSelector = SomeColumns(), keyColumns: ColumnSelector = PrimaryKeyColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), rwf: RowWriterFactory[T]): Unit

Delete data from Cassandra table, using data from the RDD as primary keys.
Delete data from Cassandra table, using data from the RDD as primary keys. Uses the specified column names.
keyspaceName
the name of the Keyspace to use
tableName
the name of the Table to use
deleteColumns
The list of column names to delete, empty ColumnSelector means full row.
keyColumns
Primary key columns selector, Optional. All RDD primary columns columns will be checked by default
writeConf
additional configuration object allowing to set consistency level, batch size, etc.

Definition Classes
RDDFunctions → WritableToCassandra
See also
com.datastax.spark.connector.writer.WritableToCassandra
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def joinWithCassandraTable[R](keyspaceName: String, tableName: String, selectedColumns: ColumnSelector = AllColumns, joinColumns: ColumnSelector = PartitionKeyColumns)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), newType: ClassTag[R], rrf: RowReaderFactory[R], ev: ValidRDDType[R], currentType: ClassTag[T], rwf: RowWriterFactory[T]): CassandraJoinRDD[T, R]

Uses the data from RDD to join with a Cassandra table without retrieving the entire table.
Uses the data from RDD to join with a Cassandra table without retrieving the entire table. Any RDD which can be used to saveToCassandra can be used to joinWithCassandra as well as any RDD which only specifies the partition Key of a Cassandra Table. This method executes single partition requests against the Cassandra Table and accepts the functional modifiers that a normal com.datastax.spark.connector.rdd.CassandraTableScanRDD takes.
By default this method only uses the Partition Key for joining but any combination of columns which are acceptable to C* can be used in the join. Specify columns using joinColumns as a parameter or the on() method.
Example With Prior Repartitioning:
```
val source = sc.parallelize(keys).map(x => new KVRow(x))
val repart = source.repartitionByCassandraReplica(keyspace, tableName, 10)
val someCass = repart.joinWithCassandraTable(keyspace, tableName)
```
Example Joining on Clustering Columns:
```
val source = sc.parallelize(keys).map(x => (x, x * 100))
val someCass = source.joinWithCassandraTable(keyspace, wideTable).on(SomeColumns("key", "group"))
```
def keyByCassandraReplica(keyspaceName: String, tableName: String, partitionKeyMapper: ColumnSelector = PartitionKeyColumns)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), currentType: ClassTag[T], rwf: RowWriterFactory[T]): RDD[(Set[InetAddress], T)]

Key every row in the RDD by with the IP Adresses of all of the Cassandra nodes which a contain a replica of the data specified by that row.
Key every row in the RDD by with the IP Adresses of all of the Cassandra nodes which a contain a replica of the data specified by that row. The calling RDD must have rows that can be converted into the partition key of the given Cassandra Table.
def leftJoinWithCassandraTable[R](keyspaceName: String, tableName: String, selectedColumns: ColumnSelector = AllColumns, joinColumns: ColumnSelector = PartitionKeyColumns)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), newType: ClassTag[R], rrf: RowReaderFactory[R], ev: ValidRDDType[R], currentType: ClassTag[T], rwf: RowWriterFactory[T]): CassandraLeftJoinRDD[T, R]

Uses the data from RDD to left join with a Cassandra table without retrieving the entire table.
Uses the data from RDD to left join with a Cassandra table without retrieving the entire table. Any RDD which can be used to saveToCassandra can be used to leftJoinWithCassandra as well as any RDD which only specifies the partition Key of a Cassandra Table. This method executes single partition requests against the Cassandra Table and accepts the functional modifiers that a normal com.datastax.spark.connector.rdd.CassandraTableScanRDD takes.
By default this method only uses the Partition Key for joining but any combination of columns which are acceptable to C* can be used in the join. Specify columns using joinColumns as a parameter or the on() method.
Example With Prior Repartitioning:
```
val source = sc.parallelize(keys).map(x => new KVRow(x))
val repart = source.repartitionByCassandraReplica(keyspace, tableName, 10)
val someCass = repart.leftJoinWithCassandraTable(keyspace, tableName)
```
Example Joining on Clustering Columns:
```
val source = sc.parallelize(keys).map(x => (x, x * 100))
val someCass = source.leftJoinWithCassandraTable(keyspace, wideTable).on(SomeColumns("key", "group"))
```
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def repartitionByCassandraReplica(keyspaceName: String, tableName: String, partitionsPerHost: Int = 10, partitionKeyMapper: ColumnSelector = PartitionKeyColumns)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), currentType: ClassTag[T], rwf: RowWriterFactory[T]): CassandraPartitionedRDD[T]

Repartitions the data (via a shuffle) based upon the replication of the given keyspaceName and tableName.
Repartitions the data (via a shuffle) based upon the replication of the given keyspaceName and tableName. Calling this method before using joinWithCassandraTable will ensure that requests will be coordinator local. partitionsPerHost Controls the number of Spark Partitions that will be created in this repartitioning event. The calling RDD must have rows that can be converted into the partition key of the given Cassandra Table.
def saveAsCassandraTable(keyspaceName: String, tableName: String, columns: ColumnSelector = AllColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), rwf: RowWriterFactory[T], columnMapper: ColumnMapper[T]): Unit

Saves the data from RDD to a new table with definition taken from the ColumnMapper for this class.
Saves the data from RDD to a new table with definition taken from the ColumnMapper for this class.
keyspaceName
keyspace where to create a new table
tableName
name of the table to create; the table must not exist
columns
Selects the columns to save data to. Uses only the unique column names, and you must select at least all primary key columns. All other fields are discarded. Non-selected property/column names are left unchanged. This parameter does not affect table creation.
writeConf
additional configuration object allowing to set consistency level, batch size, etc.
connector
optional, implicit connector to Cassandra
rwf
factory for obtaining the row writer to be used to extract column values from items of the RDD
columnMapper
a column mapper determining the definition of the table
def saveAsCassandraTableEx(table: TableDef, columns: ColumnSelector = AllColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), rwf: RowWriterFactory[T]): Unit

Saves the data from RDD to a new table defined by the given TableDef.
Saves the data from RDD to a new table defined by the given TableDef.
First it creates a new table with all columns from the TableDef and then it saves RDD content in the same way as saveToCassandra. The table must not exist prior to this call.
table
table definition used to create a new table
columns
Selects the columns to save data to. Uses only the unique column names, and you must select at least all primary key columns. All other fields are discarded. Non-selected property/column names are left unchanged. This parameter does not affect table creation.
writeConf
additional configuration object allowing to set consistency level, batch size, etc.
connector
optional, implicit connector to Cassandra
rwf
factory for obtaining the row writer to be used to extract column values from items of the RDD
def saveToCassandra(keyspaceName: String, tableName: String, columns: ColumnSelector = AllColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), rwf: RowWriterFactory[T]): Unit

Saves the data from RDD to a Cassandra table.
Saves the data from RDD to a Cassandra table. Uses the specified column names.
keyspaceName
the name of the Keyspace to use
tableName
the name of the Table to use
writeConf
additional configuration object allowing to set consistency level, batch size, etc.

Definition Classes
RDDFunctions → WritableToCassandra
See also
com.datastax.spark.connector.writer.WritableToCassandra
def spanBy[U](f: (T) ⇒ U): RDD[(U, Iterable[T])]

Applies a function to each item, and groups consecutive items having the same value together.
Applies a function to each item, and groups consecutive items having the same value together. Contrary to groupBy, items from the same group must be already next to each other in the original collection. Works locally on each partition, so items from different partitions will never be placed in the same group.
val sparkContext: SparkContext

Definition Classes
RDDFunctions → WritableToCassandra
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package connector

class RDDFunctions[T] extends WritableToCassandra[T] with Serializable

Instance Constructors

new RDDFunctions(rdd: RDD[T])

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def deleteFromCassandra(keyspaceName: String, tableName: String, deleteColumns: ColumnSelector = SomeColumns(), keyColumns: ColumnSelector = PrimaryKeyColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), rwf: RowWriterFactory[T]): Unit

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

def keyByCassandraReplica(keyspaceName: String, tableName: String, partitionKeyMapper: ColumnSelector = PartitionKeyColumns)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), currentType: ClassTag[T], rwf: RowWriterFactory[T]): RDD[(Set[InetAddress], T)]

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def saveAsCassandraTable(keyspaceName: String, tableName: String, columns: ColumnSelector = AllColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), rwf: RowWriterFactory[T], columnMapper: ColumnMapper[T]): Unit

def saveAsCassandraTableEx(table: TableDef, columns: ColumnSelector = AllColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), rwf: RowWriterFactory[T]): Unit

def saveToCassandra(keyspaceName: String, tableName: String, columns: ColumnSelector = AllColumns, writeConf: WriteConf = ...)(implicit connector: CassandraConnector = CassandraConnector(sparkContext), rwf: RowWriterFactory[T]): Unit

def spanBy[U](f: (T) ⇒ U): RDD[(U, Iterable[T])]

val sparkContext: SparkContext

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Serializable

Inherited from Serializable

Inherited from WritableToCassandra[T]

Inherited from AnyRef

Inherited from Any

Ungrouped