Importing graphs

Import a graph to DataStax Enterprise.

About this task

Use TinkerPop’s IoStep to import a graph in any format supported by Spark.

The following formats are auto-detected:

  • JSON: .json

  • Parquet: .parquet

  • Comma separated value (CSV): .csv

  • ORC: .orc

The format is set by the URL of the resource passed to the io method.

You can explicitly set the format to any format supported by Spark using the with("format", "format extension") method. Pass any additional format options using the with method.

Procedure

  1. Start the Spark shell.

    dse spark
  2. Import the graph in the Spark shell.

    If you exported the graph as described in Exporting graphs to DSEFS, import it in the Spark shell.

    Import the edges and vertices of a graph in JSON format.

    val g = spark.dseGraph("gods_import")
    g.io('dsefs:///tmp/data.json').read

    Import the edges and vertices separately.

    val g = spark.dseGraph("gods_import")
    g.io('dsefs:///tmp/data.json').with("vertices").read().iterate()
    g.io('dsefs:///tmp/data.json').with("edges").read().iterate()

    Import a graph from data in CSV format with a header line from an external URL and explicitly setting the format. Set the labels of the vertices and edges by specifying the column name in the CSV file.

    val g = spark.dseGraph("gods_import")
    val url = URL of CSV file
    g.io(url).with("format", "csv").with("outVertexLabel", "god").with("edgeLabel", "self").with("inVertexLabel", "god").with("header").with("nullValue", "null").read()
    g.io(url).with("format", "csv").with("vertexLabel", "god").with("header").with("nullValue", "null").read()

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com