Loading data with DSBulk
Here is a quick overview of how to get started with DSBulk and Astra. This will provide you with the necessary steps to load your CSV data into the Astra console through the command line.
desktopin the terminal, download the
curl -OL https://downloads.datastax.com/dsbulk/dsbulk-1.8.0.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 30.0M 100 30.0M 0 0 7545k 0 0:00:04 0:00:04 --:--:-- 7454k
Unzip the folder:
tar -xzvf dsbulk-1.8.0.tar.gz
The unzipped folder will be on the desktop. You can access the
dsbulkexecutable through the
Make sure that everything is running correctly through the command line:
DataStax Bulk Loader v1.8.0
DSBulk version 1.8.0 is installed and ready to use.
Before you can run DSBulk, get the necessary credentials to connect to Astra. To run DSBulk, we need the Client ID, Client Secret and Secure Connect Bundle.
Navigate to you Organization Settings.
Select Token Management
From the dropdown menu, select Admin User.
Generate a token for your Admin User role by selecting Generate Token.
You will be provided with the Client ID, Client Secret, and Token. For your use case, you will need only the Client ID and Client Secret.
Select Download CSV to store these credentials locally.
Navigate to your Dashboard Overview.
Select the Connect tab.
Download the Secure Connect Bundle to store locally.
To load your data with DSBulk, you need to create a keyspace and table.
Select Add Keyspace.
Create a table through the CQL console in your database:
CREATE TABLE test.world_happiness_report_2021 ( country_name text, regional_indicator text, ladder_score float, gdp_per_capita float, social_support float, healthy_life_expectancy float, generosity float, PRIMARY KEY (country_name) );
For more, see CREATE TABLE.
desc tables;to confirm the new table exists:
The table has been successfully created.
With your keyspace and table set up, you can upload your data.
If you want to use sample data, check out the world_happiness_report_2021.csv.
To execute the DSBulk upload, you will need access to the file path.
Load your table using DSBulk:
dsbulk-1.8.0/bin/dsbulk load -url **<path-to-csv-file>** -k **<keyspace_name>** -t **<table_name>** -b **<path-to-secure-connect-bundle>** -u **<client_id>** -p **<client_secret>
Operation directory: /path/to/directory/log/LOAD ... total | failed | rows/s | p50ms | p99ms | p999ms | batches 149 | 0 | 400 | 106.65 | 187.70 | 191.89 | 1.00
Success! Your rows were loaded into the table. This is a small test sample size, but DSBulk is capable of loading/unloading extremely large files.
Now, all that is left is to view the data in the Astra console.
Navigate back to the CQL shell in Astra.
Run the following command to see the output:
*select * from <keyspace_name>.<table_name>;*