Loading data

How to load data using DSE Graph Loader.

DSE Graph Loader can load data from many different input data formats. Pick the option that most resembles your data source:


Type	Description	Instructions
CSV	Strict format, with the first line of the file identifying the property keys used in the graph.	Loading CSV data
Text	Delimited text data of any format.	Loading TEXT data
Text with regular expressions	Delimited text data parsed using regular expressions (regex).	Loading TEXT data using regular expressions (regex)
JSON	Data stored in JSON (JavaScript Object Notation) format.	Loading JSON data
JDBC-compatible database	Data stored in a JDBC-compatible database	Loading data from a JDBC compatible database.
HDFS file	Data file stored in a Hadoop Distributed File System (HDFS) of any format.	Loading data from Hadoop (HDFS)
AWS S3 file	Data file stored in AWS S3 storage of any format.	Loading data from AWS S3
Gryo	Data stored in a binary Gryo format.	Loading Gryo data
GraphSON	Data stored in GraphSON format.	Loading GraphSON data
GraphML	Data stored in GraphML format.	Loading GraphML data

Note: Fields that contain NULL, null, or empty fields in text and CSV files will be pruned by DSE Graph Loader. A transform must be used if a different behavior is desired.

Warning: When loading user-defined vertex ids, the vertex cache that DSE Graph Loaders uses will be bypassed to facilitate faster write throughput. The client must ensure vertices are unique because no logic will validate the existence of a vertex with custom ids. To ensure the fastest performance, the DSE Graph configuration option external_vertex_verify should be set to false.

The DSE Graph Loader also supports loading several files of the same format from a single directory. Example mapping scripts are shown for CSV and JSON, but will work for all formats.