com.datastax.bdp.graph.spark.graphframe.classic
Return graph traversal that supports a subset of TinkerPop3 traversal steps
Return graph traversal that supports a subset of TinkerPop3 traversal steps
to start traverse with
GraphTraversal[Edge] for the filtered graph
Return graph traversal that supports a subset of TinkerPop3 traversal steps
Return graph traversal that supports a subset of TinkerPop3 traversal steps
GraphTraversal[Edge] for the graph
Return graph traversal that supports subset of TinkerPop3 traversal steps
Return graph traversal that supports subset of TinkerPop3 traversal steps
to start traverse with
GraphTraversal[Vertex] for the filtered graph
Return graph traversal that supports subset of TinkerPop3 traversal steps
Return graph traversal that supports subset of TinkerPop3 traversal steps
GraphTraversal[Vertex] for the graph
proxy call to gf.cache()
Remove any invalid vertex property and edge entries from the database backend.
Remove any invalid vertex property and edge entries from the database backend.
Call this method if you get internal errors or inconsistent results from any graph queries
it is strongly recommended to run nodetool repair graphName
before and then again after this call
the call revises graph database storage and fixes following problems
clean edges properties
clean edges properties
delete only selected properties not entire row
delete graph edges.
delete graph edges. 4 id columns should be passed to the method
+--------------------+--------------------+-------+--------------------+ | src| dst| ~label| id| +--------------------+--------------------+-------+--------------------+ |god:THxdAAAAAAAAAAAA|titan:J474AAAAAAA...| father|da0a9900-8fe1-11e...| +--------------------+--------------------+-------+--------------------+
data frame with edge ids: src,dst,~label, id
cache df before processing, true by default for consistence updates. two C* entries need to be deleted for one edge, so no reloads expected between this two calls.
shortcut for deleteEdges(df: DataFrame, cache: Boolean = true) for Java
shortcut for deleteEdges(df: DataFrame, cache: Boolean = true) for Java
clean vertex properties with meta properties
clean vertex properties with meta properties
property names to delete
clean vertex properties with meta properties
clean vertex properties with meta properties
property names to delete
delete all vertices with given label
delete all vertices with given label
delete vertices and all related edges
delete vertices and all related edges
proxy call to gf.dropIsolatedVertices()
proxy call to gf.dropIsolatedVertices()
new filtered DseGraphFrame
proxy call to gf.filterEdges()
proxy call to gf.filterEdges()
proxy call to gf.filterVertices()
proxy call to gf.filterVertices()
Returns the graph name of this DseGraphFrame.
Returns the graph name of this DseGraphFrame.
NoSuchElementException
if the graph name is not set.
Return schema of this graph based on the name
NoSuchElementException
will be thrown if the graph name is unknown and schema can not be retrieved
Return schema of this graph based on the name
NoSuchElementException
will be thrown if the graph name is unknown and schema can not be retrieved
Graph Schema
Utility method to generate GraphFrame compatible ids, if a mixed set of labels is in the DF.
Utility method to generate GraphFrame compatible ids, if a mixed set of labels is in the DF. It is slower than idColumn(label: String, idColumns: Column*): Column The id is added automatically when vertex is inserted, if inserted columns has the same names as in graph schema It is not possible for edges as you need to point both src and dst ids. Usage:
val updateEdgeDF = sourceDF.select( gf.idColumn(col("srcLabel"), col("srcId")) as "src", gf.idColumn(col("dstLabel"), col("dstId")) as "dst", col("label") as "~label", gf.randomEdgeIdColumn, col("property")) gf.updateEdges(updateEdgeDF)
If different labels have different id format use case statement to sort them:
when(col("srcLabel") === "1format", col("src1Id")).when(col("srcLabel") === "2format", col("src2Id")).otherwise(col("src3Id")) as "src"
Utility method to generate GraphFrame compatible ids.
Utility method to generate GraphFrame compatible ids. The id is added automatically when vertex is inserted, if inserted columns has the same names as in graph schema It is not possible for edges as you need to point both src and dst ids. Usage:
val updateEdgeDF = sourceDF.select( gf.idColumn("srcLabel", col("srcId")) as "src", gf.idColumn("dstLabel", col("dstId")) as "dst", col("label") as "~label", gf.randomEdgeIdColumn, col("property")) gf.updateEdges(updateEdgeDF)
Performs a read or write based operation on the Graph
backing this GraphTraversalSource
.
Performs a read or write based operation on the Graph
backing this GraphTraversalSource
. This
step can be accompanied by the Object)
modulator for further configuration
and must be accompanied by a GraphTraversal#read()
or GraphTraversal#write()
modulator step
which will terminate the traversal.
the url of file in distributed file system or JDBC connection or the name of file in default file system for which the read or write will apply - note that the context of how this parameter is used is wholly dependent on the implementation. i.e cassandra read/writer implementation will ignore this path and read table name from parameters.
the traversal with the { @link IoStep} added
proxy call to gf.persist()
proxy call to gf.persist()
Example output:
Example output:
{~label=edge, ~out_vertex={~label=vertex, community_id=1888030080, member_id=0}, ~in_vertex={~label=custom, name=Name, array=1}, ~local_id=2f3671c0-96d0-11e6-9882-f74edf21f349}
Edge label
Source vertex id
Destination vertex id
Edge ids
Associated DataFrame schema
External ID object
String of vertex ID
External ID object
proxy call to gf.unpersist()
proxy call to gf.unpersist()
update or insert edges.
update or insert edges. this method accept natural vertex id columns. Classic graph out vertex id column names should start with "outVertexLabel_" prefix and in names with "inVertexLabel_". Core graph uses DSE-DB edge table schema. The minimal df schema is: 2 id and 0 or more property columns
For example, suppose we have the following vertex and edge label definition (Core Graph)
schema.vertexLabel("person") .partitionKey("name", "ssn") .clusteringKey("age") .properties("address", "coffeePerDay") .create() schema.vertexLabel("software") .partitionKey("name") .clusteringKey("version", "lang") .properties("temp", "static_property") .create() schema.edgeLabel("created") .multiple() .properties("weight") .create() schema.edgeLabel("created") .connection("person", "software") .add()
Edge updates can be carried out like this
scala> g.updateEdges("person", "created", "software", createdDF)
where the dataframe has the following column names, note they follow the vertexLabel_idColName naming convention
scala> createdDF.show +-----------+-----------+----------+-------------+----------------+-------------+------+ |person_name| person_ssn|person_age|software_name|software_version|software_lang|weight| +-----------+-----------+----------+-------------+----------------+-------------+------+ | rocco|111-11-1111| 21| chat| 1.0| scala| 2.0| +-----------+-----------+----------+-------------+----------------+-------------+------+
Note: The dataframe is not cached by this function. The dataframe should be persisted by the user if a dynamic data source is used.
data frame with edge ids and update columns
update this graph edges.
update this graph edges. the minimal df schema is: 4 id columns and at least one property to update
+--------------------+--------------------+-------+--------------------+-------------------+ | src| dst| ~label| id| prop| +--------------------+--------------------+-------+--------------------+-------------------+ |god:THxdAAAAAAAAAAAA|titan:J474AAAAAAA...| father|da0a9900-8fe1-11e...| value| +--------------------+--------------------+-------+--------------------+-------------------+
if ID column is not present it will be generated and edges will be saved as new.
data frame with edge ids and update columns
cache df before processing, true by default for consistence updates. two C* entries need to be updated for one edge, so no reloads expected between this two calls.
shortcut for updateEdges(df: DataFrame, cache: Boolean = true) for Java
shortcut for updateEdges(df: DataFrame, cache: Boolean = true) for Java
update this graph vertices with properties provided in the df.
update this graph vertices with properties provided in the df. you should provide id in non encoded format
+-----------------+---------+---------+ | community_id|member_id| age| +-----------------+---------+---------+ | 1182054400| 0| 0| +-----------------+---------+---------+
the df is not cached by the function.
to update
dataframe with vertex id and update columns
update this graph vertices with properties provided in the df.
update this graph vertices with properties provided in the df. the minimal df schema is just vertex "id" and one property to update:
+-----------------+---------+
| id| age|
+-----------------+---------+
|god:AAAAATMAAA...| 0|
+-----------------+---------+
label and vertices id will be extracted from the graph frame id. for better performance it is recommended to add/leave "~label" column
+-----------------+---------+---------+
| id| ~label| age|
+-----------------+---------+---------+
|god:AAAAATMAAA...| god| 0|
+-----------------+---------+---------+
you can also provide id in non encoded format
+-----------------+---------+---------+---------+ | community_id|member_id| ~label| age| +-----------------+---------+---------+---------+ | 1182054400| 0| god| 0| +-----------------+---------+---------+---------+
Note: passing both synthetic "id" and vertex Id columns is an error.
dataframe with vertex id and update columns
empty (means all) by default, it is convenient to group vertexes with the same id format. That group could be passed here, to reduce number of verification steps
cache df before processing, true by default for consistence update and performance
shortcut for updateVertices(df: DataFrame, labels: Seq[String] = Seq.empty, cache: Boolean = true) for Java API
shortcut for updateVertices(df: DataFrame, labels: Seq[String] = Seq.empty, cache: Boolean = true) for Java API
dataframe with vertex id and update columns