Loading data examples

Examples of loading data using dsbulk

Load data from CSV data read from stdin to table table1 in keyspace ks1:
dsbulk load -k ks1 -t table1
Load a configuration file from /tmp/dsbulk_load.conf to use for loading the file export.csv to table table1 in keyspace ks1:
dsbulk load -f /tmp/dsbulk_load.conf --connector.csv.url export.csv -k ks1 -t table1
Load the file export.csv to table table1 in keyspace ks1 using the short form option for url:
dsbulk load -url export.csv -k ks1 -t table1
Load the file export.csv to table table1 in keyspace ks1 using the short form option for url and the tab character as a field delimiter:
dsbulk load -k ks1 -t table1 -url export.csv -delim '\t'
Load the file /tmp/export.csv to table table1 in keyspace ks1 using the long form option for url:
dsbulk load --connector.csv.url file:///tmp/export.csv -k ks1 -t table1
Note: Note that file:///tmp/export.csv is loading from localhost, hence the empty host in the file:// designation.
Load table table1 in keyspace ks1 from a gzipped CSV file by unzipping it to stdout and piping to stdin of dsbulk:
gzcat table1.csv.gz | dsbulk load -k ks1 -t table1
Specify a few hosts (initial contact points) that belong to the desired cluster and load from a local file, without headers. Map field indices of the input to table columns with -m:
dsbulk load -url ~/export.csv -k ks1 -t table1 -h '10.200.1.3, 10.200.1.4' -header false -m '0=col1,1=col3'
Specify port 9876 for the cluster hosts and load from an external source URL:
dsbulk load -url https://192.168.1.100/data/export.csv -k ks1 -t table1 -h '10.200.1.3,10.200.1.4' -port 9876
Load all csv files from a directory. The files do not have a header row, -header false. Map field indices of the input to table columns with -m:
dsbulk load -url ~/export-dir -k ks1 -t table1 -header false -m '0=col1,1=col3'
Specify a file that contains a list of multiple, well-formed URLs for the CSV or JSON data files to load:
dsbulk load --connector.json.urlfile "my/local/multiple-input-data-urls.txt" -k ks1 -t table1 -h '10.200.1.3'
Load data using a CQL query and include TTL values from the input data:
dsbulk load -query 
  "INSERT INTO dsbulkblog.iris_with_id
    (sepal_length,sepal_width,petal_length,petal_width,species,id) 
    VALUES (:sepal_length,:sepal_width,:petal_length,:petal_width,:species,:id) 
    USING TTL :ttl_to_use" 
Load data using a field-to-column mapping. The example also shows how a function call can be used to apply a function to one or more fields before mapping to a single column named odd_column:
dsbulk load 
  -url /tmp/dsbulkblog/iris.csv 
  -k dsbulkblog 
  -t iris_with_id 
  --schema.mapping "sepal_length = sepal_length, sepal_width = sepal_width, petal_length = petal_length, 
    petal_width = petal_width, species = species, id = id, max(sepal_length, petal_length) = odd_column"
With default port for cluster hosts, keyspace, table, and mapping set in conf/application.conf:
dsbulk load -url https://192.168.1.100/data/export.csv -h '10.200.1.3,10.200.1.4'
Load table table1 in keyspace ks1 from a CSV file, where double-quote characters in fields are escaped with a double-quote; for example, "f1","value with ""quotes"" and more" is a line in the CSV file:
dsbulk load -url ~/export.csv -k ks1 -t table1 -escape '\"'
Loading collections to a table has some specific helpful simplification. Collections inside a CSV file can either contain valid JSON or simpler non-compliant JSON. For example, for the following table:
CREATE TABLE t1 (col1 set<int> PRIMARY KEY, col2 list<int>, col3 map<text,int>);
This pipe-delimited CSV file contains valid JSON and could be loaded:
col1|col2|col3
[1,2,3]|[1,2,3]|{"key1":1,"key2":2}
or this similar CSV file could also be loaded:
col1|col2|col3
1,2,3|1,2,3|"key1":1,"key2":2
Notice that the surrounding brackets and braces are omitted from the valid JSON in the third column.
Important: Simplification of the JSON data will not work for nested collections. If your table has a column col4 of type list<list<int>>, only the outermost structure can omit the surrounding characters. For example:
col1,col2,col3,col4
[1,2,3]|[1,2,3]|{"key1":1,"key2":2}|[[1,2,3],[4,5,6]]
can be simplifed to:
col1,col2,col3,col4
1,2,3|1,2,3|"key1":1,"key2":2|[1,2,3],[4,5,6]
but the inner list items cannot be simplified further.