Importing graphs using DseGraphFrame

Use DseGraphFrame to import a graph to DataStax Enterprise.

Use DseGraphFrame to import a graph to DataStax Enterprise.

Prerequisites

The graph schema should be created manually in the Gremlin console or DSE Studio before importing the graph. Import only works with custom ID mapping.

Procedure

  1. Start the Spark shell.
    dse spark
  2. If you exported the graph to JSON using DseGraphFrame, import it in the Spark shell.
    val g = spark.dseGraph("gods_import")
    g.updateVertices(spark.read.json("/tmp/v.json"))
    g.updateEdges(spark.read.json("/tmp/e.json"))
    val g = spark.dseGraph("graph name")
    g.updateVertices(spark.read.json("path to exported vertices JSON"))
    g.updateEdges(spark.read.json("path to exported edges JSON"))
  3. If you have a custom graph:
    1. Examine the schema of the graph and note how to map it to the expected schema of a DSE Graph schema.
      This example will use the friends graph from the GraphFrame project.
      scala> import org.graphframes._
      scala> val g: GraphFrame = examples.Graphs.friends
      scala> g.vertices.printSchema
      root
       |-- id: string (nullable = true)
       |-- name: string (nullable = true)
       |-- age: integer (nullable = false)
      
      scala> g.edges.printSchema
      root
       |-- src: string (nullable = true)
       |-- dst: string (nullable = true)
       |-- relationship: string (nullable = true)
    2. In the Gremlin console or DSE Studio create the schema.
      system.graph('friends').create()
      :remote config alias g friends.g
          schema.propertyKey("age").Int().create()
          schema.propertyKey("name").Text().create()
          schema.propertyKey("id").Text().single().create()
          schema.vertexLabel('people').partitionKey("id").properties("name", "age").create();
          schema.edgeLabel("friend").create()
          schema.edgeLabel("follow").create()
    3. In the Spark shell create an empty DseGraphFrame graph and check the target schemas.
      scala>  val d = spark.dseGraph("friends")
      scala> d.V.printSchema
      root
       |-- id: string (nullable = false)
       |-- ~label: string (nullable = false)
       |-- _id: string (nullable = true)
       |-- name: string (nullable = true)
       |-- age: integer (nullable = true)
      
      scala> d.E.printSchema
      root
       |-- src: string (nullable = false)
       |-- dst: string (nullable = false)
       |-- ~label: string (nullable = true)
       |-- id: string (nullable = true)
    4. Convert the edges and vertices to the target format.
      scala> val v = g.vertices.select ($"id" as "_id", lit("people") as "~label", $"name", $"age")
      scala> val e = g.edges.select (d.idColumn(lit("people"), $"src") as "src", d.idColumn(lit("people"), $"dst") as "dst",  $"relationship" as "~label")
    5. Append the converted vertices and edges to the target graph.
      d.updateVertices (v)
      d.updateEdges (e)