Use DSBulk to load data
Introduction
Here is a quick overview of how to get started with DataStax Bulk Loader and Astra DB.
This topic provides the necessary steps to load your CSV data into an Astra DB database via a dsbulk load
command.
You can also use the Data Loader in the Astra Portal for existing databases. With this option, you can load up to 40 MB of CSV data. |
DSBulk install
-
From your
desktop
in the terminal, download thedsbulk
installation file:curl -OL https://downloads.datastax.com/dsbulk/dsbulk.tar.gz
Results
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 31.5M 100 31.5M 0 0 8514k 0 0:00:03 0:00:03 --:--:-- 8512k
-
Unpack the folder:
tar -xzvf dsbulk.tar.gz
Then access the
dsbulk
executable through thebin
folder: -
Make sure that everything is running correctly through the command line:
dsbulk-X.Y.Z/bin/dsbulk --version
Result
DataStax Bulk Loader vX.Y.Z
DSBulk version X.Y.Z
is installed and ready to use.
Astra Credentials
Before you can run DSBulk, get the necessary credentials to connect to Astra DB. To run DSBulk, we need the Client ID, Client Secret and Secure Connect Bundle.
-
In Astra Portal, click Settings, and then click Tokens.
-
In the Select a Token Role field, select Administrator User.
-
Click Generate Token.
-
Select Download Token Details to store the token credentials locally.
The Astra Portal shows the Client ID, Client Secret, and Token only once. Store these values securely. You can’t access them again. For the
dsbulk load
command, you need the Client ID and Client Secret values. -
In the Astra Portal navigation menu, click Databases, select your database, and click the Connect tab.
-
On the Connect page, click any of the driver types to reveal the Download Bundle option. For example:
-
Download the Secure Connect Bundle to store it locally.
Create keyspace and table
To load your data with DataStax Bulk Loader, you need to create a keyspace and table.
-
In the Astra Portal, go to Databases, and then click your database’s name.
-
Click Add Keyspace, and enter the name
tests
, as used in the following example cqlsh commands: -
Use the CQL Console to create a table in your database:
CREATE TABLE test.world_happiness_report_2021 ( country_name text, regional_indicator text, ladder_score float, gdp_per_capita float, social_support float, healthy_life_expectancy float, generosity float, PRIMARY KEY (country_name) );
-
Run
desc tables;
to confirm the new table exists. Make sure the result includesworld_happiness_report_2021
.
Load your data
With your keyspace and table set up, you can upload your data.
If you want to use sample data, check out this sample CSV file: World Happiness Report 2021. |
-
Load your table using DataStax Bulk Loader. Here’s the command format:
dsbulk-X.Y.Z/bin/dsbulk load -url <path-to-csv-file> -k <keyspace_name> -t <table_name> -b <path-to-secure-connect-bundle> -u <client_id> -p <client_secret>
Results:
Operation directory: /path/to/directory/log/LOAD ... total | failed | rows/s | p50ms | p99ms | p999ms | batches 149 | 0 | 400 | 106.65 | 187.70 | 191.89 | 1.00
Success! Your rows were loaded into the table. This is a small test sample size, but DataStax Bulk Loader can load, unload, and count extremely large files.
View your data in Astra DB
Now, all that is left is to view the data in Astra Portal.
-
Navigate back to the CQL Console tab in Astra Portal.
-
Run the following command to see the output:
select * from test.world_happiness_report_2021;
Results include: