Importing graphs using DseGraphFrame

Use DseGraphFrame to import a graph to DataStax Enterprise.

The graph schema should be created manually in the Gremlin console or DSE Studio before importing the graph. Import only works with custom ID mapping.

  1. Start the Spark shell.

    $ dse spark
  2. If you exported the graph to JSON using DseGraphFrame, import it in the Spark shell.

    val g = spark.dseGraph("gods_import")
    g.updateVertices(spark.read.json("/tmp/v.json"))
    g.updateEdges(spark.read.json("/tmp/e.json"))
    val g = spark.dseGraph("graph name")
    g.updateVertices(spark.read.json("path to exported vertices JSON"))
    g.updateEdges(spark.read.json("path to exported edges JSON"))
  3. If you have a custom graph:

    1. Examine the schema of the graph and note how to map it to the expected schema of a DSE Graph schema.

      This example will use the friends graph from the GraphFrame project.

      scala> import org.graphframes._
      scala> val g: GraphFrame = examples.Graphs.friends
      scala> g.vertices.printSchema
      root
       |-- id: string (nullable = true)
       |-- name: string (nullable = true)
       |-- age: integer (nullable = false)
      
      scala> g.edges.printSchema
      root
       |-- src: string (nullable = true)
       |-- dst: string (nullable = true)
       |-- relationship: string (nullable = true)
    2. In the Gremlin console or DSE Studio create the schema.

      system.graph('friends').create()
      :remote config alias g friends.g
          schema.propertyKey("age").Int().create()
          schema.propertyKey("name").Text().create()
          schema.propertyKey("id").Text().single().create()
          schema.vertexLabel('people').partitionKey("id").properties("name", "age").create();
          schema.edgeLabel("friend").create()
          schema.edgeLabel("follow").create()
    3. In the Spark shell create an empty DseGraphFrame graph and check the target schemas.

      scala>  val d = spark.dseGraph("friends")
      scala> d.V.printSchema
      root
       |-- id: string (nullable = false)
       |-- ~label: string (nullable = false)
       |-- _id: string (nullable = true)
       |-- name: string (nullable = true)
       |-- age: integer (nullable = true)
      
      scala> d.E.printSchema
      root
       |-- src: string (nullable = false)
       |-- dst: string (nullable = false)
       |-- ~label: string (nullable = true)
       |-- id: string (nullable = true)
    4. Convert the edges and vertices to the target format.

      scala> val v = g.vertices.select ($"id" as "_id", lit("people") as "~label", $"name", $"age")
      scala> val e = g.edges.select (d.idColumn(lit("people"), $"src") as "src", d.idColumn(lit("people"), $"dst") as "dst",  $"relationship" as "~label")
    5. Append the converted vertices and edges to the target graph.

      d.updateVertices (v)
      d.updateEdges (e)

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com