Use DseGraphFrame to import a graph to DataStax Enterprise.
Use DseGraphFrame
to import a graph to DataStax Enterprise.
Prerequisites
The graph schema should be created manually in the Gremlin console or DataStax Studio
before importing the graph. Import only works with custom ID mapping.
Procedure
-
Start the Spark shell.
-
If you exported the graph to JSON using DseGraphFrame, import it in the Spark
shell.
val g = spark.dseGraph("gods_import")
g.updateVertices(spark.read.json("/tmp/v.json"))
g.updateEdges(spark.read.json("/tmp/e.json"))
val g = spark.dseGraph("graph name")
g.updateVertices(spark.read.json("path to exported vertices JSON"))
g.updateEdges(spark.read.json("path to exported edges JSON"))
-
If you have a custom graph:
-
Examine the schema of the graph and note how to map it to the expected
schema of a DSE Graph schema.
This example will use the
friends
graph from the
GraphFrame
project.
import org.graphframes._
val g: GraphFrame = examples.Graphs.friends
g.vertices.printSchema
root
|-- id: string (nullable = true)
|-- name: string (nullable = true)
|-- age: integer (nullable = false)
g.edges.printSchema
root
|-- src: string (nullable = true)
|-- dst: string (nullable = true)
|-- relationship: string (nullable = true)
-
In the Gremlin console or DataStax Studio create the schema.
system.graph('friends').create()
:remote config alias g friends.g
schema.propertyKey("age").Int().create()
schema.propertyKey("name").Text().create()
schema.propertyKey("id").Text().single().create()
schema.vertexLabel('people').partitionKey("id").properties("name", "age").create();
schema.edgeLabel("friend").create()
schema.edgeLabel("follow").create()
-
In the Spark shell create an empty
DseGraphFrame
graph
and check the target schemas.
val d = spark.dseGraph("friends")
d.V.printSchema
root
|-- id: string (nullable = false)
|-- ~label: string (nullable = false)
|-- _id: string (nullable = true)
|-- name: string (nullable = true)
|-- age: integer (nullable = true)
d.E.printSchema
root
|-- src: string (nullable = false)
|-- dst: string (nullable = false)
|-- ~label: string (nullable = true)
|-- id: string (nullable = true)
-
Convert the edges and vertices to the target format.
val v = g.vertices.select ($"id" as "_id", lit("people") as "~label", $"name", $"age")
val e = g.edges.select (d.idColumn(lit("people"), $"src") as "src", d.idColumn(lit("people"), $"dst") as "dst", $"relationship" as "~label")
-
Append the converted vertices and edges to the target graph.
d.updateVertices (v)
d.updateEdges (e)