com.datastax.bdp.graph.spark.sql.vertex
build C* request, and id mapping according to following example.
build C* request, and id mapping according to following example.
Example of the schema: val schema: |member_id|community_id|skill| name|startTime|location|endTime|since| for one vertex it will return |member_id|community_id|NULL | name|NULL |location|NULL |NULL | val resultSchema: ("skill", "name", "startTime", "location"). C* source will be: val withIdAndMeta: |~~property-key-id| all vertex meta properties|idcolumns*|name|location| the source will return all requested properties (with meta struct if defined by schema). |NULL|name|null|location
call cassandraSourceRelation and then aggregate properties in the single row C* returns sorted rows, so we use it in mapPartition functions when yield new rows.
call cassandraSourceRelation and then aggregate properties in the single row C* returns sorted rows, so we use it in mapPartition functions when yield new rows. it is a kind of copy of SpanByKey, but without object creation overhead. the request will contains |-~~property_key_id-|-all metaData columns-|-vertex id columns-|-filteredRequiredColumns-|
filteredRequiredColumns = requiredColumns intersect cassandraColumns to prevent no such column failure.
It will return |-requiredColumns-|
can not be pushed down to cassandra dirreclty, they are not used yet.
|-requiredColumns-| RDD
all id columns should be in place.
all id columns should be in place. the method will unaggregate properties to the C* table view.* and call rdd.deleteFromCassandra()
data frame with all vertex id columns. It is ("community_id", "member_id") for standart id
, optional, only specified properties will be deleted, vertex will be deleted if empty
all id columns should be in place.
all id columns should be in place. the method will unaggregate properties to the C* table view.*
check that table is empty 'select * from table limit 1' is the fastest way
allows us to use InternalRows instead of public API rows.
allows us to use InternalRows instead of public API rows.
Maps property key ID to that key's effective TTL on this vertex label.
Maps property key ID to that key's effective TTL on this vertex label. The "effective TTL" is the minimum of this vertex label's TTL and the property key's TTL. If neither the property key nor the vertex label has a TTL, then the corresponding property key ID will be absent from this map.
The TTL of this vertex label, in seconds.
The TTL of this vertex label, in seconds. Zero means none is set.
VertexSourceRelation return DF for one vertex label only It returns DF with all graph vertex properties by default. It is done to simplify UNION of different vertexes. Unavailable vertex properties are returned as null. spark cassandra connector mapping is used to map C* type to DF types. mutli vertices are returned as array of values: ArrayType[DataType] if property definition has table, following struct will be returned StructType[value: FieldType, metaFields*]
multi-meta property will be: ArrayType[StructType[value: FieldType, meta...]]
Example of the schema: |member_id|community_id|skill| name|startTime|location|endTime|since| for one vertex it will return |member_id|community_id|NULL | name|NULL |location|NULL |NULL | if user defines columns with select("skill", "name", "startTime", "location") call, C* source will be asked for: |~~property-key-id| all vertex meta properties|idcolumns*|name|location| the source will return all requested properties (with meta struct if defined by schema). |NULL|name|null|location