Mapping script
Explain the main body of the mapping script.
Regardless of the file format selected, the main body of the mapping script is the same. After setting configuration and adding a data input source, the mapping commands are specified.
Procedure
Ignoring a field in the input
Using compressed files to load edge properties
Using labelField to parse input into different vertex labels
Inserting meta-properties
Inserting multiple meta-properties
Inserting data with a composite primary key
Loading Gryo binary data generated with TinkerGraph
Loading Gryo data generated from DSE Graph
Loading GraphSON binary data
Loading GraphML binary data
Loading multi-cardinality edges
Ignoring a field in input file
Mapping data while ignoring a field with DSE Graph Loader.
If the input file includes a field that should be ignored for a particular vertex
load, use ignore
.
Procedure
Using labelField to parse input into different vertex labels
Mapping data using labelField to parse input into different vertex labels with DSE Graph Loader.
Oftentimes, an input file includes a field that is used to identify the vertex label.
In order to load the file and create different vertex labels on-the-fly,
labelField
is used to identify that particular field.
Procedure
labelField
:
// personInput includes type of person, name, gender
// type can be either author or reviewer
// sample data in file people.dat:
// type::name::gender
// author::Julia Child::F
// reviewer::Jane Doe::F
personInput = File.text('people.dat').
delimiter("::").
header('type','name','gender')
load(personInput).asVertices{
labelField "type"
key "name"
}
g.V().hasLabel('author').valueMap()
{gender=[F], name=[Julia Child]}
g.V().hasLabel('reviewer').valueMap()
{gender=[F], name=[Jane Doe]}
Using compressed files to load data
Mapping compressed data with DSE Graph Loader.
Compressed files can be loaded using DSE Graph Loader to load both vertices and edges. This example loads vertices and edges, as well as edge properties, using gzipped files.
Procedure
/* SAMPLE INPUT
reviewer: John Doe
recipe: Beef Bourguignon
reviewerRating:
rev_name|recipe_name|timestamp|stars|comment
John Doe|Beef Bourguignon|2014-01-01|5|comment
*/
// CONFIGURATION
// Configures the data loader to create the schema
config create_schema: false, load_new: false
// DATA INPUT
// Define the data input source (a file which can be specified via command line arguments)
// inputfiledir is the directory for the input files that is given in the commandline
// as the "-filename" option
inputfiledir = '/tmp/CSV/'
// This next file is not required if the reviewers already exist
reviewerInput = File.csv(inputfiledir + "reviewers.csv.gz").
gzip().
delimiter('|')
// This next file is not required if the recipes already exist
recipeInput = File.csv(inputfiledir +"recipes.csv.gz").
gzip().
delimiter('|')
// This is the file that is used to create the edges with edge properties
reviewerRatingInput = File.csv(inputfiledir + "reviewerRatings.csv.gz").
gzip().
delimiter('|')
//Specifies what data source to load using which mapper (as defined inline)
load(reviewerInput).asVertices {
label "reviewer"
key "name"
}
load(recipeInput).asVertices {
label "recipe"
key "name"
}
load(reviewerRatingInput).asEdges {
label "rated"
outV "rev_name", {
label "reviewer"
key "name"
}
inV "recipe_name", {
label "recipe"
key "name"
}
// properties are automatically added from the file, using the header line as property keys
// from previously created schema
}
The compressed files are designated as .gz
files, followed
by a gzip()
step for processing. Edge properties are loaded
from one of the input files based on the header identifying the property
keys to use for the values listed in each line of the CSV file. The edge
properties populate a rated
edge between a
reviewer
vertex and a recipe
vertex
with the properties timestamp
, stars
, and
comment
.
Mapping data with a composite custom id
Mapping data with a composite custom id with DSE Graph Loader.
Data with a composite primary key requires some additional definition when specifying the key for loading, if the custom id uses multiple keys for definition (either partitionKeys and/or clusteringKeys).
Procedure
Mapping multi-cardinality edges
Mapping multi-cardinality edges data with DSE Graph Loader.
Multiple cardinality edges are a common type of data that is inserted into graphs. Often, the input file has both vertex and edge information for loading.
Procedure
ignore
while loading vertices:
/* SAMPLE INPUT
authorCity:
author|city|dateStart|dateEnd
Julia Child|Paris|1961-01-01|1967-02-10
*/
// CONFIGURATION
// Configures the data loader to create the schema
config dryrun: false, preparation: true, create_schema: false, load_new: true, schema_output: 'loader_output.txt'
// DATA INPUT
// Define the data input source (a file which can be specified via command line arguments)
// inputfiledir is the directory for the input files
inputfiledir = '/tmp/multiCard/'
authorCityInput = File.csv(inputfiledir + "authorCity.csv").delimiter('|')
//Specifies what data source to load using which mapper (as defined inline)
// Ignore city, dateStart, and dateEnd when creating author vertices
load(authorCityInput).asVertices {
label "author"
key "author"
ignore "city"
ignore "dateStart"
ignore "dateEnd"
}
// Ignore author, dateStart, and dateEnd when creating city vertices
load(authorCityInput).asVertices {
label "city"
key "city"
ignore "author"
ignore "dateStart"
ignore "dateEnd"
}
// create edges from author -> city and include the edge properties dateStart and dateEnd
load(authorCityInput).asEdges {
label "livedIn"
outV "author", {
label "author"
key "author"
}
inV "city", {
label "city"
key "city"
}
}
Mapping meta-properties
Mapping meta-property data with DSE Graph Loader.
If the input file includes meta-properties, or properties that have properties, use
vertexProperty
.
graphloader
// PROPERTY KEYS
schema.propertyKey('name').Text().single().create()
schema.propertyKey('gender').Text().single().create()
schema.propertyKey('badge').Text().single().create()
schema.propertyKey('since').Int().single().create()
// Create the meta-property since on the property badge
schema.propertyKey('badge').properties('since').add()
// VERTEX LABELS
schema.vertexLabel('reviewer').properties('name','gender','badge').create()
// INDEXES
schema.vertexLabel('reviewer').index('byname').materialized().by('name').add()
Procedure
vertexProperty
to identify
badge
as a vertex property. Note the structure of the
nested fields for badge
in the JSON file.
* SAMPLE INPUT
reviewer: { "name":"Jon Doe", "gender":"M", "badge" : { "value": "Gold Badge","since" : 2012 } }
*/
// CONFIGURATION
// Configures the data loader to create the schema
config dryrun: false, preparation: true, create_schema: true, load_new: true, load_vertex_threads: 3, schema_output: 'loader_output.txt'
// DATA INPUT
// Define the data input source (a file which can be specified via command line arguments)
// inputfiledir is the directory for the input files
inputfiledir = '/tmp/'
reviewerInput = File.json(inputfiledir + "reviewer.json")
//Specifies what data source to load using which mapper (as defined inline)
load(reviewerInput).asVertices{
label "reviewer"
key "name"
vertexProperty "badge", {
value "value"
}
}
reviewer
vertex where the property
badge
has a meta-property since
.
g.V().valueMap()
{badge=[Gold Badge], gender=[M], name=[Jane Doe]}
g.V().properties('badge').valueMap()
{since=2012}
Mapping multiple meta-properties
Mapping multiple meta-property data with DSE Graph Loader.
If the input file includes multiple meta-properties, or properties that have multiple
properties, use vertexProperty
.
graphloader
// PROPERTY KEYS
schema.propertyKey('badge').Text().multiple().create()
schema.propertyKey('gender').Text().single().create()
schema.propertyKey('name').Text().single().create()
schema.propertyKey('since').Int().single().create()
// VERTEX LABELS
schema.vertexLabel('reviewer').properties('name', 'gender', 'badge').create()
schema.propertyKey('badge').properties('since').add()
// INDEXES
schema.vertexLabel('reviewer').index('byname').materialized().by('name').add()
Procedure
vertexProperty
to identify
badge
as a vertex property. Note the structure of the
nested fields for badge
in the JSON file.
/* SAMPLE INPUT
reviewer: { "name":"Jane Doe", "gender":"M",
"badge" : [{ "value": "Gold Badge", "since" : 2012 },
{ "value": "Silver Badge", "since" : 2005 }] }
*/
// CONFIGURATION
// Configures the data loader to create the schema
config dryrun: false, preparation: true, create_schema: true, load_new: true, load_vertex_threads: 3, schema_output: 'loader_output.txt'
// DATA INPUT
// Define the data input source (a file which can be specified via command line arguments)
// inputfiledir is the directory for the input files
inputfiledir = '/tmp/'
reviewerInput = File.json(inputfiledir + "reviewerMultiMeta.json")
//Specifies what data source to load using which mapper (as defined inline)
load(reviewerInput).asVertices{
label "reviewer"
key "name"
vertexProperty "badge", {
value "value"
}
}
reviewer
vertex where the property
badge
has multiple values.
Choosing the pop-up link for badge
reveals the
meta-property values:
Mapping geospatial and Cartesian data
Mapping geospatial and Cartesian data with DSE Graph Loader.
Geospatial and Cartesian data can be loaded with DSE Graph Loader. The DSE Graph
Loader is not capable of creating schema for geospatial and Cartesian data, so
schema must be created before loading and the create_schema
configuration must be set to false
.
//SCHEMA
schema.propertyKey('name').Text().create()
schema.propertyKey('point').Point().withGeoBounds().create()
schema.vertexLabel('location').properties('name','point').create()
schema.propertyKey('line').Linestring().withGeoBounds().create()
schema.vertexLabel('lineLocation').properties('name','line').create()
schema.propertyKey('polygon').Polygon().withGeoBounds().create()
schema.vertexLabel('polyLocation').properties('name','polygon').create()
schema.vertexLabel('location').index('byname').materialized().by('name').add()
schema.vertexLabel('lineLocation').index('byname').materialized().by('name').add()
schema.vertexLabel('polyLocation').index('byname').materialized().by('name').add()
schema.vertexLabel('location').index('search').search().by('point').add()
schema.vertexLabel('lineLocation').index('search').search().by('line').add()
schema.vertexLabel('polyLocation').index('search).search().by('polygon').add()
Search indexes must be used for
geospatial and Cartesian points, linestrings or polygons in graph queries. DSE Graph
uses one index per query, and because geospatial data consists of latitude and
longitude (two parameters), only search indexes can be used to optimize query
performance.Procedure
Mapping Gryo data generated from DSE Graph
Inserting Gryo binary data requires a slightly modified map script. To load Gryo data, allow DSE Graph Loader to create schema and load new data. Loading will require a graph schema_modeschema_mode set to Development.
Procedure
inputfiledir = '/tmp/Gryo/'
recipeInput = com.datastax.dsegraphloader.api.Graph.file(inputfiledir + 'recipesDSEG.gryo').gryo().dse()
load(recipeInput.vertices()).asVertices {
labelField '~label'
key 'name'
}
load(recipeInput.edges()).asEdges {
labelField '~label'
outV 'outV', {
labelField '~label'
key 'name' : 'name', 'personId' : 'personId'
}
inV 'inV', {
labelField '~label'
key 'name' : 'name', 'bookId' : 'bookId'
}
}
The Gryo data format will include ~label
and
name
field values that must be used to create the
vertices. For instance, a record that is an author will have a
~label
of person
and property
name
. For the edges, notice that a custom vertex ID
consisting of both name
and bookId
is used
to identify the vertex to use as the incoming vertex for the edge.
Mapping Gryo data generated with TinkerGraph
Inserting Gryo binary data requires a slightly modified map script. To load Gryo data, allow DSE Graph Loader to create schema and load new data. Loading will require a graph schema_modeschema_mode set to Development.
Procedure
//Specifies what data source to load using which mapper (as defined inline)
load(recipeInput.vertices()).asVertices {
labelField "~label"
key "~id", "id"
}
load(recipeInput.edges()).asEdges {
labelField "~label"
outV "outV", {
labelField "~label"
key "~id", "id"
}
inV "inV", {
labelField "~label"
key "~id", "id"
}
}
The Gryo data format will include ~label
and
name
field values that must be used to create the
vertices and edges. For instance, a record that is an author will have a
~label
of author
and property
name
. The vertexKeyMap
creates a map
of each vertex label to a unique property. This map is used to create unique
keys used while loading vertices from the binary file.
Mapping GraphML binary data
Mapping GraphML data with DSE Graph Loader.
Inserting GraphML binary data requires a slightly modified map script. To load GraphML data, allow DSE Graph Loader to create schema and load new data. Loading will require a graph schema_modeschema_modeset to Development.
Procedure
//Specifies what data source to load using which mapper (as defined inline)
load(recipeInput.vertices()).asVertices {
labelField "~label"
key "~id", "id"
}
load(recipeInput.edges()).asEdges {
labelField "~label"
outV "outV", {
labelField "~label"
key "~id", "id"
}
inV "inV", {
labelField "~label"
key "~id", "id"
}
}
~label
and
~id
field values that must be used to create the label
and key for each record loaded. For instance, a record that is an author
will have a ~label
of author
. The
~id
will similarly be set in the record, a difference
from other data. The difference can be seen by looking at a record and
noting the presence of the id
field, based on the second
item in each key
setting in the mapping
script:g.V().hasLabel('author').valueMap()
{gender=[F], name=[Julia Child], id=[0]}
{gender=[F], name=[Simone Beck], id=[3]}
Mapping GraphSON binary data
Mapping GraphSON data with DSE Graph Loader.
Inserting GraphSON data requires a slightly modified map script. To load GraphSON data, allow DSE Graph Loader to create schema and load new data. Loading will require a graph schema_modeschema_modeset to Development.
Procedure
//Specifies what data source to load using which mapper (as defined inline)
load(recipeInput.vertices()).asVertices {
labelField "~label"
key "~id", "id"
}
load(recipeInput.edges()).asEdges {
labelField "~label"
outV "outV", {
labelField "~label"
key "~id", "id"
}
inV "inV", {
labelField "~label"
key "~id", "id"
}
}
~label
and
~id
field values that must be used to create the label
and key for each record loaded. For instance, a record that is an author
will have a ~label
of author
. The
~id
will similarly be set in the record, a difference
from other data. The difference can be seen by looking at a record and
noting the presence of the id
field, based on the second
item in each key
setting in the mapping
script:g.V().hasLabel('author').valueMap()
{gender=[F], name=[Julia Child], id=[0]}
{gender=[F], name=[Simone Beck], id=[3]}