Loading data from a JDBC compatible database.
How to use the DSE Graph Loader to load data from a JDBC compatible database.
Procedure
- If desired, add configuration to the mapping script.
-
A sample of the data for load looks like the following:
SAMPLE INPUT // For the author data: name:Julia Child gender:F // For the book data: name:Simca's Cuisine: 100 Classic French Recipes for Every Occasion year:1972 ISBN:0-394-40152-2 // For the authorBook data: bname:Simca's Cuisine: 100 Classic French Recipes for Every Occasion aname:Simone BeckBecause the property key
nameis used for both vertex labelsauthorandbook, in theauthorBookfile, variablesanameandbnameare used for author name and book name, respectively. These variables are used in the mapping logic used to create the edges betweenauthorandbookvertices. -
Some databases will need a driver installed in the same directory as the
graphloaderscript. For the following example using MySQL, the driver can be downloaded. Unzip the file and copy the mysql-connector-java-5.1.44-bin.jar file to the correct directory. A similar download would be required for the other databases as well. -
Specify the data input database with JDBC information. The variable
inputDatabasespecifies the data input database. This example uses the MySQLdatabase, but any JDBC-compliant database (H2, MySQL, Postgres, Oracle) can be used. Theconnectionto alocalhostand a MySQL databasesampleare specified. In addition ,userandpasswordare defined. TheMySQL()step denotes the data connection to a MySQL database. The connection can alternatively define a remote machine address.// DATA INPUT // Define the data input source (a database connection and SQL statements for data selection) // inputDatabase is the database name inputDatabase = 'localhost/sample' db = Database.connection('jdbc:mysql://' + inputDatabase).user('root').password('foo').MySQL() // Define multiple data inputs from the database source via SQL queries authorInput = db.query "select * from author"; bookInput = db.query "select * from book"; authorBookInput = db.query "select * from authorbook";Note: To load data from H2, the connection line could be:
For Postgres,inputDatabase = '~/test' db = Database.connection("jdbc:h2:" + inputDatabase).H2().user("sa")Postgre()is used, and for Oracle,Oracle(). -
In each line, the database query is specified that will be used to retrieve the
data. A map,
authorInput, is created that will be used to process the data. The map can be manipulated before loading using transforms.authorInput = db.query "SELECT * FROM AUTHOR";Important: DSE Graph Loader will retrieve all column names from the database with lower-cased names. Create the graph schema with corresponding lower-cased names to avoid read errors. - Create the main body of the mapping script. This part of the mapping script is the same regardless of the file format.
-
To run DSE Graph Loader for text loading as a dry run, use the following
command:
graphloader authorBookMappingJDBC.groovy -graph testJDBC -address localhost -dryrun true
For testing purposes, the graph specified does not have to exist prior to running
graphloader. However, for production applications, the graph and schema should be created prior to usinggraphloader. -
The full loading script is shown.
/* SAMPLE INPUT author: name:Julia Child gender:F book: name:Simca's Cuisine: 100 Classic French Recipes for Every Occasion year:1972 ISBN:0-394-40152-2 authorBook: bname:Simca's Cuisine: 100 Classic French Recipes for Every Occasion aname:Simone Beck */ // CONFIGURATION // Configures the data loader to create the schema config create_schema: true, load_new: true, load_vertex_threads: 3 // DATA INPUT // Define the data input source (a database connection and SQL statements for data selection) inputDatabase = 'localhost/sample' db = Database.connection('jdbc:mysql://' + inputDatabase).user('root').password('foo').MySQL() // Define multiple data inputs from the database source via SQL queries authorInput = db.query "select * from author"; bookInput = db.query "select * from book"; authorBookInput = db.query "select * from authorbook"; //Specifies what data source to load using which mapper (as defined inline) load(authorInput).asVertices { label "author" key "name" } load(bookInput).asVertices { label "book" key "name" } load(authorBookInput).asEdges { label "authored" outV "aname", { label "author" key "name" } inV "bname", { label "book" key "name" } }
