com.datastax.bdp.graph.spark.graphframe
Returned graph traversal supports subset of TinkerPop3 traversal steps
Returned graph traversal supports subset of TinkerPop3 traversal steps
GraphTraversal[Edge] for the graph
Returned graph traversal supports subset of TinkerPop3 traversal steps
Returned graph traversal supports subset of TinkerPop3 traversal steps
GraphTraversal[Vertex] for the graph
proxy call to gf.cache()
proxy call to gf.cache()
this
Remove any invalid vertex property and edge entries from the database backend.
Remove any invalid vertex property and edge entries from the database backend.
Call this method if you get internal errors or inconsistent results from any graph queries
it is strongly recommended to run nodetool repair graphName
before and then again after this call
the call revises graph database storage and fixes following problems
- delete vertex properties entries of non-existent vertex
clean edges properties
clean edges properties
delete only selected properties not entire row
delete graph edges.
delete graph edges. 4 id columns should be passed to the method +--------------------+--------------------+-------+--------------------+ | src| dst| ~label| id| +--------------------+--------------------+-------+--------------------+ |god:THxdAAAAAAAAAAAA|titan:J474AAAAAAA...| father|da0a9900-8fe1-11e...| +--------------------+--------------------+-------+--------------------+
data frame with edge ids: src,dst,~label, id
cache df before processing, true by default for consistence updates. two C* entries need to be deleted for one edge, so no reloads expected between this two calls.
shortcut for deleteEdges(df: DataFrame, cache: Boolean = true) for Java
clean vertex properties with meta properties
clean vertex properties with meta properties
property names to delete
clean vertex properties with meta properties
clean vertex properties with meta properties
property names to delete
delete all vertices with given label
delete vertices and all related edges
restore or change the name of the graph
Return schema of this graph base on it name NoSuchElementException will be thrown if graph name is unknown and schema can not be retrieved
Return schema of this graph base on it name NoSuchElementException will be thrown if graph name is unknown and schema can not be retrieved
Graph Schema
Utility method to generate GraphFrame compatible ids, if a mixed set of labels is in the DF.
Utility method to generate GraphFrame compatible ids, if a mixed set of labels is in the DF. It is slower then String, idColumns: Column*): Column The id is added automatically when vertex is inserted, if inserted columns has the same names as in graph schema It is not possible for edges as you need to point both src and dst ids. Usage: val updateEdgeDF = sourceDF.select (gf.idColumn(col("srcLabel"), col("srcId")) as src, gf.idColumn(col("dstLabel"), col("dstId")) ad dst, col("label") as "~label", gf.randomEdgeIdColumn, col("property")) gf.updateEdges(updateEdgeDF) If different labels have different id format use case statement to sort them: when(col("srcLabel") === "1format", col("src1Id")).when(col("srcLabel") === "2format", col("src2Id")).otherwise(col("src3Id")) as src
Utility method to generate GraphFrame compatible ids.
Utility method to generate GraphFrame compatible ids. The id is added automatically when vertex is inserted, if inserted columns has the same names as in graph schema It is not possible for edges as you need to point both src and dst ids. Usage: val updateEdgeDF = sourceDF.select (gf.idColumn("srcLabel", col("srcId")) ad src, gf.idColumn("dstLabel", col("dstId")) as dst, col("label") as "~label", gf.randomEdgeIdColumn, col("property")) gf.updateEdges(updateEdgeDF)
proxy call to gf.persist()
proxy call to gf.persist()
this
proxy call to gf.persist()
proxy call to gf.persist()
this
proxy call to gf.unpersist()
proxy call to gf.unpersist()
this
proxy call to gf.unpersist()
proxy call to gf.unpersist()
this
update this graph edges.
update this graph edges. this method accept natural vertex id columns. Out vertex column names should start with "out_" prefix and in names with "in_". The method will update only one triplet combination. the minimal df schema is: 2 id columns and 0 or more properties columns +-----+------+--------------------+-------------------+ |out_id|in_id| id| prop| +-----+------+--------------------+-------------------+ | 10| a|da0a9900-8fe1-11e...| value| +-----+------+--------------------+-------------------+
id column should contains UUID(0,0).toString() value for single edges and pre-generated UUID for mutli-cardinality edges outVertexLabel->edgeLabel->inVertexLabel is passed as parameters. the df is not cached by the function. the dataframe should be persisted by the user if dynamic data source is used.
data frame with edge ids and update columns
update this graph edges.
update this graph edges. the minimal df schema is: 4 id columns and at least one property to update +--------------------+--------------------+-------+--------------------+-------------------+ | src| dst| ~label| id| prop| +--------------------+--------------------+-------+--------------------+-------------------+ |god:THxdAAAAAAAAAAAA|titan:J474AAAAAAA...| father|da0a9900-8fe1-11e...| value| +--------------------+--------------------+-------+--------------------+-------------------+
if ID column is not present it will be generated and edges will be saved as new.
data frame with edge ids and update columns
cache df before processing, true by default for consistence updates. two C* entries need to be updated for one edge, so no reloads expected between this two calls.
shortcut for updateEdges(df: DataFrame, cache: Boolean = true) for Java
update this graph vertices with properties provided in the df.
update this graph vertices with properties provided in the df. you should provide id in non encoded format +-----------------+---------+---------+ | community_id|member_id| age| +-----------------+---------+---------+ | 1182054400| 0| 0| +-----------------+---------+---------+ the df is not cached by the function.
to update
dataframe with vertex id and update columns
update this graph vertices with properties provided in the df.
update this graph vertices with properties provided in the df. the minimal df schema is just vertex "id" and one property to update: +-----------------+---------+ | id| age| +-----------------+---------+ |god:AAAAATMAAA...| 0| +-----------------+---------+ label and vertices id will be extracted from the graph frame id. for better performance it is recommended to add/leave "~label" column +-----------------+---------+---------+ | id| ~label| age| +-----------------+---------+---------+ |god:AAAAATMAAA...| god| 0| +-----------------+---------+---------+ you can also provide id in non encoded format +-----------------+---------+---------+---------+ | community_id|member_id| ~label| age| +-----------------+---------+---------+---------+ | 1182054400| 0| god| 0| +-----------------+---------+---------+---------+ Note: passing both synthetic "id" and vertex Id columns is an error.
dataframe with vertex id and update columns
empty (means all) by default, it is convenient to group vertexes with the same id format. That group could be passed here, to reduce number of verification steps
cache df before processing, true by default for consistence update and performance
shortcut for updateVertices(df: DataFrame, labels: Seq[String] = Seq.empty, cache: Boolean = true) for Java API
shortcut for updateVertices(df: DataFrame, labels: Seq[String] = Seq.empty, cache: Boolean = true) for Java API
dataframe with vertex id and update columns
Provides DSEGraph-specific methods on GraphFrame It support graphName is needed for some traversal steps and to write data back It can be lost during DseGraphFrame->GraphFrame->DseGraphFrame implicit conversions set graphName tot he target graph if needed.