Getting Started

Getting Started with dsbulk.

This guide demonstrates the key features of using dsbulk to get a user started.

Prerequisites

Obtain the following information and resources:

Key features

Procedure

Simple loading without configuration file

  1. Loading CSV data with a dsbulk load command:

    Specify two hosts (initial contact points) that belong to the desired cluster and load from a local file export.csv with headers into keyspace ks1 and table table1:

    dsbulk load -url export.csv -k ks1 -t table1 -h '10.200.1.3, 10.200.1.4' -header true
    url can designate the path to a resource, such as a local file, or a web URL from which to read/write data.
    Specify an external source of data, as well as a port for the cluster hosts:
    dsbulk load -url https://svr/data/export.csv -k ks1 -t table1 -h '10.200.1.3, 10.200.1.4' -port 9876

    Load CSV data from stdin as it is generated from a loading script generate_data. The data is loaded to the keyspace ks1 and table table1 in a cluster with a localhost contact point (default if no hosts are defined). By default if not specified, the field names are read from a header row in the input file.

    generate_data | dsbulk load -url stdin:/ -k ks1 -t table1

Simple unloading without configuration file

  1. Unloading CSV data with a dsbulk unload command:
    Specify the external file to write the data to from keyspace ks1 and table table1:
    dsbulk unload -url myData.csv -k ks1 -t table1

Creating a configuration file

  1. The configuration file for setting values for dsbulk are written in a simple format, one option per line:
    ############ MyConfFile.conf ############
    
    dsbulk {
       # The name of the connector to use
       connector.name = "csv"
       # CSV field delimiter
       connector.csv.delimiter = "|"
       # The keyspace to connect to
       schema.keyspace = "myKeyspace"
       # The table to connect to
       schema.table = "myTable"
       # The field-to-column mapping
       schema.mapping = "0=name, 1=age, 2=email" 
    }
    Tip: Settings in the config file always start with the dsbulk prefix, while on the command line, this prefix must be omitted. To avoid confusion, configuration files are formatted with the following equivalent HOCON syntax: dsbulk { connector.name = "csv" ... }.
    To use the configuration file, specify -f filename, where filename is the configuration file:
    dsbulk load -f myConfFile.conf -url export.csv -k ks1 -t table1

Using SSL with dsbulk

  1. To use SSL with dsbulk, first refer to DSE Security docs to set up SSL. The SSL options can be specified on the command line, but a configuration file is a good option given the long option names:
    driver.ssl.keystore.password = cassandra
    driver.ssl.keystore.path = "/Users/johndoe/tmp/ssl/keystore.node0"
    driver.ssl.provider = OpenSSL
    driver.ssl.truststore.password = dserocks
    driver.ssl.truststore.path = "/Users/johndoe/tmp/ssl/truststore.node0"
    The command is:
    dsbulk load -f mySSLFile.conf -url file1.csv -k ks1 -t table1