Loading data with DSBulk

Introduction

Here is a quick overview of how to get started with DSBulk and Astra. This will provide you with the necessary steps to load your CSV data into the Astra console through the command line.

Install

  1. From your desktop in the terminal, download the dsbulk installation file:

    curl -OL https://downloads.datastax.com/dsbulk/dsbulk-1.8.0.tar.gz

    Results

      % Total  %  Received  % Xferd Average  Speed   Time     Time    Time   Current
                                    Dload   Upload   Total    Spent   Left   Speed
    100 30.0M 100 30.0M     0     0 7545k       0  0:00:04  0:00:04 --:--:--  7454k
  2. Unzip the folder:

    tar -xzvf dsbulk-1.8.0.tar.gz

    The unzipped folder will be on the desktop. You can access the dsbulk executable through the bin folder:

    3 binfolder
  3. Make sure that everything is running correctly through the command line:

    dsbulk-1.8.0/bin/dsbulk --version

    Result

    DataStax Bulk Loader v1.8.0

DSBulk version 1.8.0 is installed and ready to use.

Astra Credentials

Before you can run DSBulk, get the necessary credentials to connect to Astra. To run DSBulk, we need the Client ID, Client Secret and Secure Connect Bundle.

  1. Navigate to you Organization Settings.

    OrgSelection
  2. Select Token Management

  3. From the dropdown menu, select Admin User.

    generate tokens
  4. Generate a token for your Admin User role by selecting Generate Token.

    You will be provided with the Client ID, Client Secret, and Token. For your use case, you will need only the Client ID and Client Secret.

  5. Select Download CSV to store these credentials locally.

  6. Navigate to your Dashboard Overview.

  7. Select the Connect tab.

  8. Download the Secure Connect Bundle to store locally.

    secure connect bundle

Create keyspace and table

To load your data with DSBulk, you need to create a keyspace and table.

Navigate to your Dashboard Overview.
  1. Select Add Keyspace.

    8 addkeyspace
  2. Create a table through the CQL console in your database:

    9 CQL
    CREATE TABLE test.world_happiness_report_2021 (
      country_name text,
      regional_indicator text,
      ladder_score float,
      gdp_per_capita float,
      social_support float,
      healthy_life_expectancy float,
      generosity float,
      PRIMARY KEY (country_name)
    );

    For more, see CREATE TABLE.

  3. Run desc tables; to confirm the new table exists:

    Results

    world_happiness_report_2021

The table has been successfully created.

Load your data

With your keyspace and table set up, you can upload your data.

If you want to use sample data, check out the world_happiness_report_2021.csv.

To execute the DSBulk upload, you will need access to the file path.

11 worldhappinessreport
  1. Load your table using DSBulk:

    dsbulk-1.8.0/bin/dsbulk load -url **<path-to-csv-file>** -k **<keyspace_name>** -t **<table_name>** -b **<path-to-secure-connect-bundle>** -u **<client_id>** -p **<client_secret>

    Results

    Operation directory: /path/to/directory/log/LOAD ...
    total | failed | rows/s |  p50ms |  p99ms | p999ms | batches
      149 |      0 |    400 | 106.65 | 187.70 | 191.89 |    1.00

Success! Your rows were loaded into the table. This is a small test sample size, but DSBulk is capable of loading/unloading extremely large files.

View your data in Astra

Now, all that is left is to view the data in the Astra console.

  1. Navigate back to the CQL shell in Astra.

  2. Run the following command to see the output:

    *select * from <keyspace_name>.<table_name>;*

    Results

    13 cqloutput