Loading data with DSBulk
Here is a quick overview of how to get started with DataStax Bulk Loader and Astra DB.
This topic provides the necessary steps to load your CSV data into an Astra DB database via a
dsbulk load command.
Another option is to use the Load Data feature, which is available for an existing database in Astra Portal. You can load CSV data up to 40 MB through the UI. Start on the Astra DB Dashboard and click the link for your database. Follow the Load Data dialog to select your CSV file and have its data loaded into the database keyspace and table that you specify. See Astra Data Loader. Example:
desktopin the terminal, download the
curl -OL https://downloads.datastax.com/dsbulk/dsbulk-1.9.0.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 31.5M 100 31.5M 0 0 8514k 0 0:00:03 0:00:03 --:--:-- 8512k
Unpack the folder:
tar -xzvf dsbulk-1.9.0.tar.gz
Then access the
dsbulkexecutable through the
Make sure that everything is running correctly through the command line:
DataStax Bulk Loader v1.9.0
DSBulk version 1.9.0 is installed and ready to use.
Before you can run DSBulk, get the necessary credentials to connect to Astra DB. To run DSBulk, we need the Client ID, Client Secret and Secure Connect Bundle.
In Astra Portal, navigate to your Organization Settings.
Select Token Management.
From the Select Role dropdown menu, select Administrator User.
Generate a token for your Administrator User role by selecting Generate Token.
You will be provided with the Client ID, Client Secret, and a Token. For the
dsbulk loadcommand shown later in this topic, you’ll need the Client ID and Client Secret values.
Select Download Token Details to store these credentials locally.
Navigate to your Dashboard Overview by clicking on the "DataStax Astra" icon in the top banner.
Create a database, if you haven’t already.
For one of your active databases, select the Connect tab. On the Connect page, to reveal the Download Bundle option, click any of the Driver types. Example:
Download the Secure Connect Bundle to store it locally.
To load your data with DataStax Bulk Loader, you need to create a keyspace and table.
Navigate to your Dashboard Overview for your existing database.
Select Add Keyspace and name it test, as used in the following example cqlsh commands:
Create a table through the CQL Console in your database:
CREATE TABLE test.world_happiness_report_2021 ( country_name text, regional_indicator text, ladder_score float, gdp_per_capita float, social_support float, healthy_life_expectancy float, generosity float, PRIMARY KEY (country_name) );
For more, see CREATE TABLE.
desc tables;to confirm the new table exists:
The results will include:
world_happiness_report_2021 table has been successfully created.
With your keyspace and table set up, you can upload your data.
If you want to use sample data, check out this sample CSV file: World Happiness Report 2021.
Load your table using DataStax Bulk Loader. Here’s the command format:
dsbulk-1.9.0/bin/dsbulk load -url <path-to-csv-file> -k <keyspace_name> -t <table_name> -b <path-to-secure-connect-bundle> -u <client_id> -p <client_secret>
Operation directory: /path/to/directory/log/LOAD ... total | failed | rows/s | p50ms | p99ms | p999ms | batches 149 | 0 | 400 | 106.65 | 187.70 | 191.89 | 1.00
Success! Your rows were loaded into the table. This is a small test sample size, but DataStax Bulk Loader can load, unload, and count extremely large files.
Now, all that is left is to view the data in Astra Portal.
Navigate back to the CQL Console tab in Astra Portal.
Run the following command to see the output:
select * from test.world_happiness_report_2021;