Loading data from a JDBC compatible database.
How to use the DSE Graph Loader to load data from a JDBC compatible database.
Procedure
- If desired, add configuration to the mapping script.
-
A sample of the data for load looks like the following:
SAMPLE INPUT // For the author data: name:Julia Child gender:F // For the book data: name:Simca's Cuisine: 100 Classic French Recipes for Every Occasion year:1972 ISBN:0-394-40152-2 // For the authorBook data: bname:Simca's Cuisine: 100 Classic French Recipes for Every Occasion aname:Simone Beck
Because the property key
name
is used for both vertex labelsauthor
andbook
, in theauthorBook
file, variablesaname
andbname
are used for author name and book name, respectively. These variables are used in the mapping logic used to create the edges betweenauthor
andbook
vertices. -
Some databases will need a driver installed in the same directory as the
graphloader
script. For the following example using MySQL, the driver can be downloaded. Unzip the file and copy the mysql-connector-java-5.1.44-bin.jar file to the correct directory. A similar download would be required for the other databases as well. -
Specify the data input database with JDBC information. The variable
inputDatabase
specifies the data input database. This example uses the MySQLdatabase, but any JDBC-compliant database (H2, MySQL, Postgres, Oracle) can be used. Theconnection
to alocalhost
and a MySQL databasesample
are specified. In addition ,user
andpassword
are defined. TheMySQL()
step denotes the data connection to a MySQL database. The connection can alternatively define a remote machine address.// DATA INPUT // Define the data input source (a database connection and SQL statements for data selection) // inputDatabase is the database name inputDatabase = 'localhost/sample' db = Database.connection('jdbc:mysql://' + inputDatabase).user('root').password('foo').MySQL() // Define multiple data inputs from the database source via SQL queries authorInput = db.query "select * from author"; bookInput = db.query "select * from book"; authorBookInput = db.query "select * from authorbook";
Note: To load data from H2, the connection line could be:
For Postgres,inputDatabase = '~/test' db = Database.connection("jdbc:h2:" + inputDatabase).H2().user("sa")
Postgre()
is used, and for Oracle,Oracle()
. -
In each line, the database query is specified that will be used to retrieve the
data. A map,
authorInput
, is created that will be used to process the data. The map can be manipulated before loading using transforms.authorInput = db.query "SELECT * FROM AUTHOR";
Important: DSE Graph Loader will retrieve all column names from the database with lower-cased names. Create the graph schema with corresponding lower-cased names to avoid read errors. - Create the main body of the mapping script. This part of the mapping script is the same regardless of the file format.
-
To run DSE Graph Loader for text loading as a dry run, use the following
command:
graphloader authorBookMappingJDBC.groovy -graph testJDBC -address localhost -dryrun true
For testing purposes, the graph specified does not have to exist prior to running
graphloader
. However, for production applications, the graph and schema should be created prior to usinggraphloader
. -
The full loading script is shown.
/* SAMPLE INPUT author: name:Julia Child gender:F book: name:Simca's Cuisine: 100 Classic French Recipes for Every Occasion year:1972 ISBN:0-394-40152-2 authorBook: bname:Simca's Cuisine: 100 Classic French Recipes for Every Occasion aname:Simone Beck */ // CONFIGURATION // Configures the data loader to create the schema config create_schema: true, load_new: true, load_vertex_threads: 3 // DATA INPUT // Define the data input source (a database connection and SQL statements for data selection) inputDatabase = 'localhost/sample' db = Database.connection('jdbc:mysql://' + inputDatabase).user('root').password('foo').MySQL() // Define multiple data inputs from the database source via SQL queries authorInput = db.query "select * from author"; bookInput = db.query "select * from book"; authorBookInput = db.query "select * from authorbook"; //Specifies what data source to load using which mapper (as defined inline) load(authorInput).asVertices { label "author" key "name" } load(bookInput).asVertices { label "book" key "name" } load(authorBookInput).asEdges { label "authored" outV "aname", { label "author" key "name" } inV "bname", { label "book" key "name" } }