Loading TEXT
data
The data mapping script for delimited text data is shown with explanation. The full script is found at the bottom of the page.
-
If desired, add configuration to the mapping script.
-
A sample of the data for load looks like the following:
SAMPLE INPUT // For the author.dat file: Julia Child|F // For the book.dat file: Simca's Cuisine: 100 Classic French Recipes for Every Occasion|1972|0-394-40152-2 // For the authorBook.dat file: Simca's Cuisine: 100 Classic French Recipes for Every Occasion|Simone Beck
-
Specify the data input files. The variable
inputfiledir
specifies the directory name for the input files. Each of the identified files will be used for loading.// DATA INPUT // Define the data input source (a file which can be specified via command line arguments) // inputfiledir is the directory for the input files inputfiledir = '/tmp/TEXT/' authorInput = File.text(inputfiledir + "author.dat"). delimiter("|"). header('name', 'gender') bookInput = File.text(inputfiledir + "book.dat"). delimiter("|"). header('name', 'year', 'ISBN') authorBookInput = File.text(inputfiledir + "authorBook.dat"). delimiter("|"). header('bname', 'aname')
Because the property key
name
is used for both vertex labelsauthor
andbook
, in theauthorBook
file, variablesaname
andbname
are used for author name and book name, respectively. These variables are used in the mapping logic used to create the edges betweenauthor
andbook
vertices. -
In each line, the file is specified as a
text
file, the file name is specified, a delimiter is set, and a header can be specified to identify the fields that will be read. The header can alternatively be specified on the first line of the data file. A map,authorInput
, is created that will be used to process the data. The map can be manipulated before loading using transforms.authorInput = File.text(inputfiledir + "author.dat").delimiter("|").header('name', 'gender')
If a
header()
is used in the mapping script and a header line is used in the data file, then both must match. Either a header line in the data file or aheader()
is required. -
Create the main body of the mapping script. This part of the mapping script is the same regardless of the file format.
-
To run DSE Graph Loader for text loading as a dry run, use the following command:
$ graphloader authorBookMappingTEXT.groovy -graph testTEXT -address localhost -dryrun true
For testing purposes, the graph specified does not have to exist prior to running
graphloader
. However, for production applications, the graph and schema should be created prior to usinggraphloader
. -
The full loading script is shown.
/** SAMPLE INPUT author: Julia Child|F book : Simca's Cuisine: 100 Classic French Recipes for Every Occasion|1972|0-394-40152-2 authorBook: Simca's Cuisine: 100 Classic French Recipes for Every Occasion|Simone Beck */ // CONFIGURATION // Configures the data loader to create the schema config create_schema: true, load_new: true, load_vertex_threads: 3 // DATA INPUT // Define the data input source (a file which can be specified via command line arguments) // inputfiledir is the directory for the input files that is given in the commandline // as the "-filename" option inputfiledir = '/tmp/CSV/' authorInput = File.text(inputfiledir + "author.dat"). delimiter("|"). header('name', 'gender') bookInput = File.text(inputfiledir + "book.dat"). delimiter("|"). header('name', 'year', 'ISBN') authorBookInput = File.text(inputfiledir + "authorBook.dat"). delimiter("|"). header('bname', 'aname') //Specifies what data source to load using which mapper (as defined inline) load(authorInput).asVertices { label "author" key "name" } load(bookInput).asVertices { label "book" key "name" } load(authorBookInput).asEdges { label "authored" outV "aname", { label "author" key "name" } inV "bname", { label "book" key "name" } }
-
Mapping several files with same format from a directory
-
A sample of the data for load looks like the following:
SAMPLE INPUT // For the author.text file: name|gender Julia Child|F Simone Beck|F // For the knows.text file: aname|bname Julia Child|James Beard
A number of files with the same format exist in a directory. If the files differ, the graphloader will issue an error and stop:
java.lang.IllegalArgumentException: /tmp/dirSource/data has more than 1 input type.
-
Specify the data input directory. The variable
inputfiledir
specifies the directory for the input files. Each of the identified files will be used for loading.// DATA INPUT // Define the data input source (a file which can be specified via command line arguments) // inputfiledir is the directory for the input files inputfiledir = '/tmp/dirSource/data' personInput = File.directory(inputfiledir).delimiter('|').header('name','gender') //Specifies what data source to load using which mapper (as defined inline) load(personInput).asVertices { label "author" key "name" }
The important element is
File.directory()
; this defines the directory where the files are stored. -
Note that two directories could be used to load vertices and edges:
// DATA INPUT // Define the data input source (a file which can be specified via command line arguments) // inputfiledir is the directory for the input files inputfiledir = '/tmp/dirSource/data' vertexfiledir = inputfiledir+'/vertices' edgefiledir = inputfiledir+'/edges' personInput = File.directory(vertexfiledir).delimiter('|').header('name','gender') personEdgeInput = File.directory(edgefiledir).delimiter('|').header('aname','bname') //Specifies what data source to load using which mapper (as defined inline) load(personInput).asVertices { label "author" key "name" } load(personEdgeInput).asEdges { label "knows" outV "aname", { label "author" key "name" } inV "bname", { label "book" key "name" } }
-
To run DSE Graph Loader for text file loading from a directory, use the following command:
$ graphloader dirSourceMapping.groovy -graph testdirSource -address localhost
For testing purposes, the graph specified does not have to exist prior to running
graphloader
. However, for production applications, the graph and schema should be created prior to usinggraphloader
.