Loading data from a JDBC compatible database.

The data mapping script for loading from a JDBC compatible database is shown with explanation. The full script is found at the bottom of the page.

Using DSE Graph Loader to load directly from a JDBC compatible database is convenient, but very slow for a large database. Test a small dataset first, to see if the time required to move a larger dataset makes this method efficient.

  • If desired, add configuration to the mapping script.

  • A sample of the data for load looks like the following:

    SAMPLE INPUT
    // For the author data:
    name:Julia Child	gender:F
    // For the book data:
    name:Simca's Cuisine: 100 Classic French Recipes for Every Occasion	year:1972	ISBN:0-394-40152-2
    // For the authorBook data:
    bname:Simca's Cuisine: 100 Classic French Recipes for Every Occasion	aname:Simone Beck

    Because the property key name is used for both vertex labels author and book, in the authorBook file, variables aname and bname are used for author name and book name, respectively. These variables are used in the mapping logic used to create the edges between author and book vertices.

  • Some databases will need a driver installed in the same directory as the graphloader script. For the following example using MySQL, the driver can be downloaded. Unzip the file and copy the mysql-connector-java-5.1.44-bin.jar file to the correct directory. A similar download would be required for the other databases as well.

  • Specify the data input database with JDBC information. The variable inputDatabase specifies the data input database. This example uses the MySQLdatabase, but any JDBC-compliant database (H2, MySQL, Postgres, Oracle) can be used. The connection to a localhost and a MySQL database sample are specified. In addition , user and password are defined. The MySQL() step denotes the data connection to a MySQL database. The connection can alternatively define a remote machine address.

    // DATA INPUT
    // Define the data input source (a database connection and SQL statements for data selection)
    // inputDatabase is the database name
    inputDatabase = 'localhost/sample'
    db = Database.connection('jdbc:mysql://' + inputDatabase).user('root').password('foo').MySQL()
    // Define multiple data inputs from the database source via SQL queries
    authorInput = db.query "select * from author";
    bookInput = db.query "select * from book";
    authorBookInput = db.query "select * from authorbook";

    To load data from H2, the connection line could be:

    inputDatabase = '~/test'
    db = Database.connection("jdbc:h2:" + inputDatabase).H2().user("sa")

    For Postgres, Postgre() is used, and for Oracle, Oracle().

  • In each line, the database query is specified that will be used to retrieve the data. A map, authorInput, is created that will be used to process the data. The map can be manipulated before loading using transforms.

    authorInput = db.query "SELECT * FROM AUTHOR";

    DSE Graph Loader will retrieve all column names from the database with lower-cased names. Create the graph schema with corresponding lower-cased names to avoid read errors.

  • Create the main body of the mapping script. This part of the mapping script is the same regardless of the file format.

  • To run DSE Graph Loader for text loading as a dry run, use the following command:

    graphloader authorBookMappingJDBC.groovy -graph testJDBC -address localhost -dryrun true

    For testing purposes, the graph specified does not have to exist prior to running graphloader. However, for production applications, the graph and schema should be created prior to using graphloader.

  • The full loading script is shown.

    /* SAMPLE INPUT
    author:
    name:Julia Child	gender:F
    book:
    name:Simca's Cuisine: 100 Classic French Recipes for Every Occasion	year:1972	ISBN:0-394-40152-2
    authorBook:
    bname:Simca's Cuisine: 100 Classic French Recipes for Every Occasion	aname:Simone Beck
     */
    
    // CONFIGURATION
    // Configures the data loader to create the schema
    config create_schema: true, load_new: true, load_vertex_threads: 3
    
    // DATA INPUT
    // Define the data input source (a database connection and SQL statements for data selection)
    inputDatabase = 'localhost/sample'
    db = Database.connection('jdbc:mysql://' + inputDatabase).user('root').password('foo').MySQL()
    
    // Define multiple data inputs from the database source via SQL queries
    authorInput = db.query "select * from author";
    bookInput = db.query "select * from book";
    authorBookInput = db.query "select * from authorbook";
    
    //Specifies what data source to load using which mapper (as defined inline)
    
    load(authorInput).asVertices {
        label "author"
        key "name"
    }
    
    load(bookInput).asVertices {
        label "book"
        key "name"
    }
    
    load(authorBookInput).asEdges {
        label "authored"
        outV "aname", {
            label "author"
            key "name"
        }
        inV "bname", {
            label "book"
            key "name"
        }
    }

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com