partitioner

Type Members

class BucketingRangeIndex[R, T] extends AnyRef

A special structure for fast lookup of rangesContaining containing given point.
case class CassandraPartition[V, T <: Token[V]](index: Int, endpoints: Iterable[InetAddress], tokenRanges: Iterable[CqlTokenRange[V, T]], dataSize: Long) extends EndpointPartition with Product with Serializable

Metadata describing Cassandra table partition processed by a single Spark task.
Metadata describing Cassandra table partition processed by a single Spark task. Beware the term "partition" is overloaded. Here, in the context of Spark, it means an arbitrary collection of rows that can be processed locally on a single Cassandra cluster node. A CassandraPartition typically contains multiple CQL partitions, i.e. rows identified by different values of the CQL partitioning key.
index
identifier of the partition, used internally by Spark
endpoints
which nodes the data partition is located on
tokenRanges
token ranges determining the row set to be fetched
dataSize
estimated amount of data in the partition
class CassandraPartitionedRDD[T] extends RDD[T]

RDD created by repartitionByCassandraReplica with preferred locations mapping to the CassandraReplicas each partition was created for.
case class CqlTokenRange[V, T <: Token[V]](range: TokenRange[V, T])(implicit tf: TokenFactory[V, T]) extends Product with Serializable

Stores a CQL WHERE predicate matching a range of tokens.
class DataSizeEstimates[V, T <: Token[V]] extends Logging

Estimates amount of data in the Cassandra table.
Estimates amount of data in the Cassandra table. Takes token range size estimates from the system.size_estimates table, available since Cassandra 2.1.5.
trait EndpointPartition extends Partition
trait MonotonicBucketing[-T] extends AnyRef

A mapping from T values to an integer range [0, n), such that for any (t1: T) > (t2: T), bucket(t1) >= bucket(t2).
class NodeAddresses extends Serializable

Looks up listen address of a cluster node given its Native Transport address.
Looks up listen address of a cluster node given its Native Transport address. Uses system.peers table as the source of information. If such information for a node is missing, it assumes its listen address equals its RPC address
trait RangeBounds[-R, T] extends AnyRef

Extracts rangeBounds of a range R.
Extracts rangeBounds of a range R. This is to allow working with any representation of rangesContaining. The range must not wrap, that is end >= start.
case class ReplicaPartition(index: Int, endpoints: Set[InetAddress]) extends EndpointPartition with Product with Serializable
class ReplicaPartitioner[T] extends Partitioner

The replica partitioner will work on an RDD which is keyed on sets of InetAddresses representing Cassandra Hosts .
The replica partitioner will work on an RDD which is keyed on sets of InetAddresses representing Cassandra Hosts . It will group keys which share a common IP address into partitionsPerReplicaSet Partitions.
class TokenRangeClusterer[V, T <: Token[V]] extends AnyRef

Groups a set of token ranges into groupCount groups containing not more than maxGroupSize token ranges.
Groups a set of token ranges into groupCount groups containing not more than maxGroupSize token ranges. Each group will form a single CassandraRDDPartition.
The algorithm is as follows: 1. Sort token ranges by endpoints lexicographically. 2. Take the highest possible number of token ranges from the beginning of the list, such that their sum of ringFraction does not exceed ringFractionPerGroup and they all contain at least one common endpoint. If it is not possible, take at least one item. Those token ranges will make a group. 3. Repeat the previous step until no more token ranges left.
case class TokenRangeWithPartitionIndex[V, T <: Token[V]](range: TokenRange[V, T], partitionIndex: Int) extends Product with Serializable

Holds a token range together with the index of a partition this token range belongs to

Value Members

object CassandraPartitionGenerator
object DataSizeEstimates
object MonotonicBucketing
object TokenRangeClusterer
object TokenRangeSplitter
object TokenRangeWithPartitionIndex extends Serializable
package dht

package partitioner

Type Members

class BucketingRangeIndex[R, T] extends AnyRef

case class CassandraPartition[V, T <: Token[V]](index: Int, endpoints: Iterable[InetAddress], tokenRanges: Iterable[CqlTokenRange[V, T]], dataSize: Long) extends EndpointPartition with Product with Serializable

class CassandraPartitionedRDD[T] extends RDD[T]

case class CqlTokenRange[V, T <: Token[V]](range: TokenRange[V, T])(implicit tf: TokenFactory[V, T]) extends Product with Serializable

class DataSizeEstimates[V, T <: Token[V]] extends Logging

trait EndpointPartition extends Partition

trait MonotonicBucketing[-T] extends AnyRef

class NodeAddresses extends Serializable

trait RangeBounds[-R, T] extends AnyRef

case class ReplicaPartition(index: Int, endpoints: Set[InetAddress]) extends EndpointPartition with Product with Serializable

class ReplicaPartitioner[T] extends Partitioner

class TokenRangeClusterer[V, T <: Token[V]] extends AnyRef

case class TokenRangeWithPartitionIndex[V, T <: Token[V]](range: TokenRange[V, T], partitionIndex: Int) extends Product with Serializable

Value Members

object CassandraPartitionGenerator

object DataSizeEstimates

object MonotonicBucketing

object TokenRangeClusterer

object TokenRangeSplitter

object TokenRangeWithPartitionIndex extends Serializable

package dht

Inherited from AnyRef

Inherited from Any

Ungrouped