Loading GraphML data

The data mapping script for GraphML data is shown with explanation. The full script is found at the bottom of the page.

DSE Graph Loader can load GraphML files generated with TinkerGraph, the in-memory graph database included with Apache TinkerPop. GraphML files generated with DSE Graph cannot be loaded using DSE Graph Loader.

  • If desired, add configuration to the mapping script.

  • Specify the data input file. The variable inputfiledir specifies the directory for the input file. The identified file will be used for loading.

    // DATA INPUT
    // Define the data input source
    // inputfiledir is the directory for the input files
    
    inputfiledir = '/tmp/GraphML/'
    recipeInput = Graph.file(inputfiledir + 'recipe.xml').graphml()
  • The file is specified as a xml file and an additional step graphml() identifies that the file should be processed as a GraphML file. A map, recipeInput, is created that will be used to process the data.

    recipeInput = Graph.file(inputfiledir + 'recipe.xml')

    Note that Graph.file is used, in contrast to File.csv or File.json.

  • Create the main body of the mapping script. This part of the mapping script is the same regardless of the file format, although GraphML files use a slightly modified version.

  • To run DSE Graph Loader for GraphML loading as a dry run, use the following command:

    $ graphloader recipeMappingGraphML.groovy -graph testGraphML -address localhost -dryrun true

    For testing purposes, the graph specified does not have to exist prior to running graphloader. However, for production applications, the graph and schema should be created prior to using graphloader.

  • The full loading script is shown:

    /* SAMPLE INPUT
    GraphML file is an XML  file
     */
    
    // CONFIGURATION
    // Configures the data loader to create the schema
    config create_schema: true, load_new: true
    
    // DATA INPUT
    // Define the data input source
    // inputfiledir is the directory for the input files
    
    inputfiledir = '/tmp/GraphML/'
    recipeInput = Graph.file(inputfiledir + 'recipe.xml').graphml()
    
    //Specifies what data source to load using which mapper (as defined inline)
    
    load(recipeInput.vertices()).asVertices {
        labelField "~label"
        key "~id", "id"
    }
    
    load(recipeInput.edges()).asEdges {
        labelField "~label"
        outV "outV", {
            labelField "~label"
            key "~id", "id"
        }
        inV "inV", {
            labelField "~label"
            key "~id", "id"
        }
    }

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com