Loading data
How to load data using DSE Graph Loader.
DSE Graph Loader can load data from many different input data formats. Pick the option that
most resembles your data source:
Type | Description | Instructions |
---|---|---|
CSV | Strict format, with the first line of the file identifying the property keys used in the graph. | Loading CSV data |
Text | Delimited text data of any format. | Loading TEXT data |
Text with regular expressions | Delimited text data parsed using regular expressions (regex). | Loading TEXT data using regular expressions (regex) |
JSON | Data stored in JSON (JavaScript Object Notation) format. | Loading JSON data |
JDBC-compatible database | Data stored in a JDBC-compatible database | Loading data from a JDBC compatible database. |
HDFS file | Data file stored in a Hadoop Distributed File System (HDFS) of any format. | Loading data from Hadoop (HDFS) |
AWS S3 file | Data file stored in AWS S3 storage of any format. | Loading data from AWS S3 |
Gryo | Data stored in a binary Gryo format. | Loading Gryo data |
GraphSON | Data stored in GraphSON format. | Loading GraphSON data |
GraphML | Data stored in GraphML format. | Loading GraphML data |
Note: Fields that contain
NULL
, null
, or empty
fields in text and CSV files will be pruned by DSE Graph Loader. A transform must be used if
a different behavior is desired.Warning: When loading user-defined vertex ids, the vertex
cache that DSE Graph Loaders uses will be bypassed to facilitate faster write throughput.
The client must ensure vertices are unique because no logic will validate the existence of a
vertex with custom ids. To ensure the fastest performance, the DSE Graph configuration
option external_vertex_verify should
be set to false.
The DSE Graph Loader also supports loading several files of the same format from a single directory. Example mapping scripts are shown for CSV and JSON, but will work for all formats.