Load data with DSBulk

If your CSV file is more than 40MB, you can upload data with the DataStax Bulk Loader (DSBulk). DSBulk provides commands like dsbulk load, dsbulk unload, and dsbulk count, along with extensive options. For more information, see the DataStax Bulk Loader reference.

Download the dsbulk installation file. DSBulk 1.11.0 or later is required to support the vector CQL data type. The following command automatically downloads the latest DSBulk version:

curl -OL https://downloads.datastax.com/dsbulk/dsbulk.tar.gz

Result

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   242  100   242    0     0    681      0 --:--:-- --:--:-- --:--:--   691
100 40.4M  100 40.4M    0     0  20.7M      0  0:00:01  0:00:01 --:--:-- 31.6M

Extract the DSBulk archive:

tar -xzvf dsbulk.tar.gz

Result

This example uses DSBulk version 1.11.0:

x dsbulk-1.11.0/README.md
x dsbulk-1.11.0/LICENSE.txt
x dsbulk-1.11.0/manual/
x dsbulk-1.11.0/manual/driver.template.conf
x dsbulk-1.11.0/manual/settings.md
x dsbulk-1.11.0/manual/application.template.conf
x dsbulk-1.11.0/bin/dsbulk
x dsbulk-1.11.0/bin/dsbulk.cmd
x dsbulk-1.11.0/conf/
x dsbulk-1.11.0/conf/driver.conf
x dsbulk-1.11.0/conf/application.conf
x dsbulk-1.11.0/THIRD-PARTY.txt
x dsbulk-1.11.0/lib/java-driver-core-4.17.0.jar
x dsbulk-1.11.0/lib/native-protocol-1.5.1.jar
x dsbulk-1.11.0/lib/netty-handler-4.1.94.Final.jar
   .
   .
   .
x dsbulk-1.11.0/lib/lz4-java-1.8.0.jar
x dsbulk-1.11.0/lib/snappy-java-1.1.7.3.jar
x dsbulk-1.11.0/lib/jansi-1.18.jar

To verify the installation, run the following command in the same directory where you extracted DSBulk:
```
dsbulk-VERSION/bin/dsbulk --version
```
Result
DataStax Bulk Loader v1.11.0
Create an application token with the Administrator User role, and then store the token securely.
If you haven’t done so already, create a database.
Download the database’s secure connect bundle.
Create a table in your database:
1. In the Astra Portal, click Managed clusters, and then click the name of your database.
2. Click CQL Console.
3. When the token@cqlsh> prompt appears, select the keyspace where you want to create the table:
  use KEYSPACE_NAME;
4. Create a table to load a sample dataset:
  CREATE TABLE KEYSPACE_NAME.world_happiness_report_2021 ( country_name text, regional_indicator text, ladder_score float, gdp_per_capita float, social_support float, healthy_life_expectancy float, generosity float, PRIMARY KEY (country_name) );
  If you want to load your own data, replace world_happiness_report_2021 with your table name, and then adjust the column names and data types for your data.
To load the sample dataset, download the World Happiness Report 2021 sample dataset. This is a small sample dataset, but DSBulk can load, unload, and count extremely large files.

Use DSBulk to load data into the table:

dsbulk-VERSION/bin/dsbulk load -url PATH_TO_CSV_FILE -k KEYSPACE_NAME \
-t TABLE_NAME -b PATH_TO_SECURE_CONNECT_BUNDLE -u token \
-p APPLICATION_TOKEN

Result

Operation directory: /path/to/directory/log/LOAD ...
total | failed | rows/s |  p50ms |  p99ms | p999ms | batches
  149 |      0 |    400 | 106.65 | 187.70 | 191.89 |    1.00

After the upload completes, you can query the loaded data from the CQL Console:
```
SELECT * FROM KEYSPACE_NAME.worldhappinessreport2021;
```

Load data with DSBulk

Was this helpful?

Give Feedback