Loading JSON data
How to use the DSE Graph Loader to load JSON data.
A common file format for loading graph data is JSON. An input JSON file holds all key and value information in a nested structure.
Mapping several different JSON files
Mapping several different JSON files with DSE Graph Loader.
DSE Graph Loader can load several different CSV files that exist in a directory using
the following steps. Sample input data:
SAMPLE INPUT
// For the author.json file:
{"author_name":"Julia Child","gender":"F"}
// For the book.json file:
{"name":"The Art of French Cooking, Vol. 1","year":"1961","ISBN":"none"}
// For the authorBook.json file:
{"name":"The Art of French Cooking, Vol. 1","author":"Julia Child"}
Because
the property key name
is used for both vertex labels
author
and book
, in the
authorBook
file, variables aname
and
bname
are used for author name and book name, respectively.
These variables are used in the mapping logic used to create the edges between
author
and book
vertices.Procedure
Example
/* SAMPLE INPUT
author: {"name":"Julia Child","gender":"F"}
book : {"name":"The Art of French Cooking, Vol. 1","year":"1961","ISBN":"none"}
authorBook: {"bname":"The Art of French Cooking, Vol. 1","aname":"Julia Child"}
*/
// CONFIGURATION
// Configures the data loader to create the schema
config create_schema: true, load_new: true, load_vertex_threads: 3
// DATA INPUT
// Define the data input source (a file which can be specified via command line arguments)
// inputfiledir is the directory for the input files that is given in the commandline
// as the "-filename" option
inputfiledir = '/tmp/JSON/'
authorInput = File.json(inputfiledir + 'author.json')
bookInput = File.json(inputfiledir + 'book.json')
authorBookInput = File.json(inputfiledir + 'authorBook.json')
//Specifies what data source to load using which mapper (as defined inline)
load(authorInput).asVertices {
label "author"
key "name"
}
load(bookInput).asVertices {
label "book"
key "name"
}
load(authorBookInput).asEdges {
label "authored"
outV "aname", {
label "author"
key "name"
}
inV "bname", {
label "book"
key "name"
}
}
Mapping several files with same format from a directory
Mapping several same format JSON files with DSE Graph Loader.
DSE Graph Loader can load several JSON files with same format that exist in a
directory using the following steps. Sample input data:
SAMPLE INPUT
// For the author.json file:
{"author_name":"Julia Child","gender":"F"}
// For the book.json file:
{"name":"The Art of French Cooking, Vol. 1","year":"1961","ISBN":"none"}
// For the authorBook.json file:
{"name":"The Art of French Cooking, Vol. 1","author":"Julia Child"}
A
number of files with the same format exist in a directory. If the files differ, the
graphloader will issue an error and
stop:java.lang.IllegalArgumentException: /tmp/dirSource/data has more than 1 input type.
Procedure
Mapping files from a directory using a file pattern
Mapping several same format CSV files with DSE Graph Loader.
DSE Graph Loader can load several files from a directory using file pattern matching
Sample input files:
$ ls data badOne.csv person1.csv person2.csvA number of files with the same format exist in a directory. If the files differ, DSE Graph Loader will only load the files that match the pattern in the map script.
Several file patterns are defined for use:
Pattern | Description | Example |
---|---|---|
* | Matches zero or more characters. While matching, it will not cross directory boundaries. | *.csv will match all CSV files ending in csv. |
** | Same as * but will cross directory boundaries. | CSV files in more than one directory. |
? | Matches only one character. | person?.csv will match all CSV files, named as person1.csv or personA.csv, but not person11.csv. |
\ | Avoid characters being interpreted as special characters, e.g. \\ to get a single \ . | |
[ ] | Matches a set of designated characters, though only as ingle character is matched. | [efg] matches "e", "f", or "g", so person[efg] matches persone, personf or persong. [1-9] matches any one number; person[1-9] will get files person1.csv through person9.csv. |
{ } | Matches group of sub-patterns. | {csv,json} will match all CSV and JSON files in the directory. |
Procedure
Mapping using *
Mapping using [ ]
Mapping using { } with multiple patterns