Loading data with DSBulk

Introduction

Here is a quick overview of how to get started with DataStax Bulk Loader and Astra DB. This topic provides the necessary steps to load your CSV data into an Astra DB database via a dsbulk load command.

You can also use the Data Loader in the Astra Portal for existing databases. With this option, you can load up to 40 MB of CSV data.

DSBulk install

  1. From your desktop in the terminal, download the dsbulk installation file:

    curl -OL https://downloads.datastax.com/dsbulk/dsbulk-1.9.0.tar.gz

    Results

      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100 31.5M  100 31.5M    0     0  8514k      0  0:00:03  0:00:03 --:--:-- 8512k
  2. Unpack the folder:

    tar -xzvf dsbulk-1.9.0.tar.gz

    Then access the dsbulk executable through the bin folder:

    3 binfolder
  3. Make sure that everything is running correctly through the command line:

    dsbulk-1.9.0/bin/dsbulk --version

    Result

    DataStax Bulk Loader v1.9.0

DSBulk version 1.9.0 is installed and ready to use.

Astra Credentials

Before you can run DSBulk, get the necessary credentials to connect to Astra DB. To run DSBulk, we need the Client ID, Client Secret and Secure Connect Bundle.

  1. In Astra Portal, navigate to your Organization Settings.

  2. Select Token Management.

  3. From the Select Role dropdown menu, select Administrator User.

    generate tokens
  4. Generate a token for your Administrator User role by selecting Generate Token.

    You will be provided with the Client ID, Client Secret, and a Token. For the dsbulk load command shown later in this topic, you’ll need the Client ID and Client Secret values.

  5. Select Download Token Details to store these credentials locally.

  6. Go to Databases, and create a database if needed. Otherwise, select an active database.

  7. Click the Connect tab.

  8. On the Connect page, to reveal the Download Bundle option, click any of the Driver types.

    Astra Secure Bundle option is shown a Connect Drivers page.

  9. Download the Secure Connect Bundle to store it locally.

Create keyspace and table

To load your data with DataStax Bulk Loader, you need to create a keyspace and table.

  1. In the Astra Portal, go to Databases, and then select your database.

  2. Select Add Keyspace and name it test, as used in the following example cqlsh commands:

    8 addkeyspace
  3. Create a table through the CQL Console in your database:

    9 CQL
    CREATE TABLE test.world_happiness_report_2021 (
      country_name text,
      regional_indicator text,
      ladder_score float,
      gdp_per_capita float,
      social_support float,
      healthy_life_expectancy float,
      generosity float,
      PRIMARY KEY (country_name)
    );

    For more, see CREATE TABLE.

  4. Run desc tables; to confirm the new table exists:

    The results will include:

    world_happiness_report_2021

The world_happiness_report_2021 table has been successfully created.

Load your data

With your keyspace and table set up, you can upload your data.

If you want to use sample data, check out this sample CSV file: World Happiness Report 2021.

  1. Load your table using DataStax Bulk Loader. Here’s the command format:

    dsbulk-1.9.0/bin/dsbulk load -url <path-to-csv-file> -k <keyspace_name> -t <table_name> -b <path-to-secure-connect-bundle> -u <client_id> -p <client_secret>

    Results:

    Operation directory: /path/to/directory/log/LOAD ...
    total | failed | rows/s |  p50ms |  p99ms | p999ms | batches
      149 |      0 |    400 | 106.65 | 187.70 | 191.89 |    1.00

Success! Your rows were loaded into the table. This is a small test sample size, but DataStax Bulk Loader can load, unload, and count extremely large files.

View your data in Astra DB

Now, all that is left is to view the data in Astra Portal.

  1. Navigate back to the CQL Console tab in Astra Portal.

  2. Run the following command to see the output:

    select * from test.world_happiness_report_2021;

    Results include:

    13 cqloutput

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com