This is slightly different than Scala this.type.
This is slightly different than Scala this.type.
this.type is the unique singleton type of an object which is not compatible with other
instances of the same type, so returning anything other than this
is not really possible
without lying to the compiler by explicit casts.
Here SelfType is used to return a copy of the object - a different instance of the same type
Maps each row into object of a different type using provided function taking column value(s) as argument(s).
Maps each row into object of a different type using provided function taking column value(s) as argument(s). Can be used to convert each row to a tuple or a case class object:
sc.cassandraTable("ks", "table") .select("column1") .as((s: String) => s) // yields CassandraRDD[String] sc.cassandraTable("ks", "table") .select("column1", "column2") .as((_: String, _: Long)) // yields CassandraRDD[(String, Long)] case class MyRow(key: String, value: Long) sc.cassandraTable("ks", "table") .select("column1", "column2") .as(MyRow) // yields CassandraRDD[MyRow]
Counts the number of items in this RDD by selecting count(*) on Cassandra table
Counts the number of items in this RDD by selecting count(*) on Cassandra table
Adds a CQL ORDER BY
clause to the query.
Adds a CQL ORDER BY
clause to the query.
It can be applied only in case there are clustering columns and primary key predicate is
pushed down in where
.
It is useful when the default direction of ordering rows within a single Cassandra partition
needs to be changed.
This method overrides the default spark behavior and will not create a CoalesceRDD.
This method overrides the default spark behavior and will not create a CoalesceRDD. Instead it will reduce the number of partitions by adjusting the partitioning of C* data on read. Using this method will override spark.cassandra.input.split.size. The method is useful with where() method call, when actual size of data is smaller then the table size. It has no effect if a partition key is used in where clause.
number of partitions
whether to call shuffle after
is ignored if no shuffle, or just passed to shuffled CoalesceRDD
new CassandraTableScanRDD with predefined number of partitions
Allows to copy this RDD with changing some of the properties
Allows to copy this RDD with changing some of the properties
Extracts a key of the given class from all the available columns.
Extracts a key of the given class from all the available columns.
keyBy(ColumnSelector)
Extracts a key of the given class from the given columns.
Extracts a key of the given class from the given columns.
keyBy(ColumnSelector)
Selects a subset of columns mapped to the key and returns an RDD of pairs.
Selects a subset of columns mapped to the key and returns an RDD of pairs. Similar to the builtin Spark keyBy method, but this one uses implicit RowReaderFactory to construct the key objects. The selected columns must be available in the CassandraRDD.
If the selected columns contain the complete partition key a
CassandraPartitioner
will also be created.
column selector passed to the rrf to create the row reader, useful when the key is mapped to a tuple or a single value
Adds the limit clause to CQL select statement.
Adds the limit clause to CQL select statement. The limit will be applied for each created Spark partition. In other words, unless the data are fetched from a single Cassandra partition the number of results is unpredictable.
The main purpose of passing limit clause is to fetch top n rows from a single Cassandra partition when the table is designed so that it uses clustering keys and a partition key predicate is passed to the where clause.
Filters currently selected set of columns with a new set of columns
Filters currently selected set of columns with a new set of columns
Adds the PER PARTITION LIMIT clause to CQL select statement.
Adds the PER PARTITION LIMIT clause to CQL select statement. The limit will be applied for every Cassandra Partition. Only Valid For Cassandra 3.6+
RowReaderFactory and ClassTag should be provided from implicit parameters in the constructor of the class implementing this trait
RowReaderFactory and ClassTag should be provided from implicit parameters in the constructor of the class implementing this trait
CassandraTableScanRDD
Narrows down the selected set of columns.
Narrows down the selected set of columns.
Use this for better performance, when you don't need all the columns in the result RDD.
When called multiple times, it selects the subset of the already selected columns, so
after a column was removed by the previous select
call, it is not possible to
add it back.
The selected columns are ColumnRef instances. This type allows to specify columns for
straightforward retrieval and to read TTL or write time of regular columns as well. Implicit
conversions included in com.datastax.spark.connector package make it possible to provide
just column names (which is also backward compatible) and optional add .ttl
or .writeTime
suffix in order to create an appropriate ColumnRef instance.
Returns the columns to be selected from the table.
Returns the columns to be selected from the table.
Checks for existence of keyspace and table.
Checks for existence of keyspace and table.
Adds a CQL WHERE
predicate(s) to the query.
Adds a CQL WHERE
predicate(s) to the query.
Useful for leveraging secondary indexes in Cassandra.
Implicitly adds an ALLOW FILTERING
clause to the WHERE clause,
however beware that some predicates might be rejected by Cassandra,
particularly in cases when they filter on an unindexed, non-clustering column.
Returns a copy of this Cassandra RDD with specified connector
Returns a copy of this Cassandra RDD with specified connector
Allows to set custom read configuration, e.g.
Allows to set custom read configuration, e.g. consistency level or fetch size.
RDD representing a Cassandra table for Spark Streaming.
com.datastax.spark.connector.rdd.CassandraTableScanRDD