Loading data from a JDBC compatible database.
The data mapping script for loading from a JDBC compatible database is shown with explanation. The full script is found at the bottom of the page.
Using DSE Graph Loader to load directly from a JDBC compatible database is convenient, but very slow for a large database. Test a small dataset first, to see if the time required to move a larger dataset makes this method efficient. |
-
If desired, add configuration to the mapping script.
-
A sample of the data for load looks like the following:
SAMPLE INPUT // For the author data: name:Julia Child gender:F // For the book data: name:Simca's Cuisine: 100 Classic French Recipes for Every Occasion year:1972 ISBN:0-394-40152-2 // For the authorBook data: bname:Simca's Cuisine: 100 Classic French Recipes for Every Occasion aname:Simone Beck
Because the property key
name
is used for both vertex labelsauthor
andbook
, in theauthorBook
file, variablesaname
andbname
are used for author name and book name, respectively. These variables are used in the mapping logic used to create the edges betweenauthor
andbook
vertices. -
Some databases will need a driver installed in the same directory as the
graphloader
script. For the following example using MySQL, the driver can be downloaded. Unzip the file and copy themysql-connector-java-5.1.44-bin.jar
file to the correct directory. A similar download would be required for the other databases as well. -
Specify the data input database with JDBC information. The variable
inputDatabase
specifies the data input database. This example uses the MySQLdatabase, but any JDBC-compliant database (H2, MySQL, Postgres, Oracle) can be used. Theconnection
to alocalhost
and a MySQL databasesample
are specified. In addition ,user
andpassword
are defined. TheMySQL()
step denotes the data connection to a MySQL database. The connection can alternatively define a remote machine address.// DATA INPUT // Define the data input source (a database connection and SQL statements for data selection) // inputDatabase is the database name inputDatabase = 'localhost/sample' db = Database.connection('jdbc:mysql://' + inputDatabase).user('root').password('foo').MySQL() // Define multiple data inputs from the database source via SQL queries authorInput = db.query "select * from author"; bookInput = db.query "select * from book"; authorBookInput = db.query "select * from authorbook";
To load data from H2, the connection line could be:
inputDatabase = '~/test' db = Database.connection("jdbc:h2:" + inputDatabase).H2().user("sa")
For Postgres,
Postgre()
is used, and for Oracle,Oracle()
. -
In each line, the database query is specified that will be used to retrieve the data. A map,
authorInput
, is created that will be used to process the data. The map can be manipulated before loading using transforms.authorInput = db.query "SELECT * FROM AUTHOR";
DSE Graph Loader will retrieve all column names from the database with lower-cased names. Create the graph schema with corresponding lower-cased names to avoid read errors.
-
Create the main body of the mapping script. This part of the mapping script is the same regardless of the file format.
-
To run DSE Graph Loader for text loading as a dry run, use the following command:
graphloader authorBookMappingJDBC.groovy -graph testJDBC -address localhost -dryrun true
For testing purposes, the graph specified does not have to exist prior to running
graphloader
. However, for production applications, the graph and schema should be created prior to usinggraphloader
. -
The full loading script is shown.
/* SAMPLE INPUT author: name:Julia Child gender:F book: name:Simca's Cuisine: 100 Classic French Recipes for Every Occasion year:1972 ISBN:0-394-40152-2 authorBook: bname:Simca's Cuisine: 100 Classic French Recipes for Every Occasion aname:Simone Beck */ // CONFIGURATION // Configures the data loader to create the schema config create_schema: true, load_new: true, load_vertex_threads: 3 // DATA INPUT // Define the data input source (a database connection and SQL statements for data selection) inputDatabase = 'localhost/sample' db = Database.connection('jdbc:mysql://' + inputDatabase).user('root').password('foo').MySQL() // Define multiple data inputs from the database source via SQL queries authorInput = db.query "select * from author"; bookInput = db.query "select * from book"; authorBookInput = db.query "select * from authorbook"; //Specifies what data source to load using which mapper (as defined inline) load(authorInput).asVertices { label "author" key "name" } load(bookInput).asVertices { label "book" key "name" } load(authorBookInput).asEdges { label "authored" outV "aname", { label "author" key "name" } inV "bname", { label "book" key "name" } }