Load data with DSBulk
If your CSV file is more than 40MB, you can upload data with the DataStax Bulk Loader (DSBulk).
DSBulk provides commands like dsbulk load
, dsbulk unload
, and dsbulk count
, along with extensive options.
For more information, see the DataStax Bulk Loader reference.
-
Download the
dsbulk
installation file. DSBulk 1.11.0 or later is required to support the vector CQL data type. The following command automatically downloads the latest DSBulk version:curl -OL https://downloads.datastax.com/dsbulk/dsbulk.tar.gz
Results
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 242 100 242 0 0 681 0 --:--:-- --:--:-- --:--:-- 691 100 40.4M 100 40.4M 0 0 20.7M 0 0:00:01 0:00:01 --:--:-- 31.6M
-
Extract the DSBulk archive:
tar -xzvf dsbulk.tar.gz
Results
This example uses DSBulk version 1.11.0:
x dsbulk-1.11.0/README.md x dsbulk-1.11.0/LICENSE.txt x dsbulk-1.11.0/manual/ x dsbulk-1.11.0/manual/driver.template.conf x dsbulk-1.11.0/manual/settings.md x dsbulk-1.11.0/manual/application.template.conf x dsbulk-1.11.0/bin/dsbulk x dsbulk-1.11.0/bin/dsbulk.cmd x dsbulk-1.11.0/conf/ x dsbulk-1.11.0/conf/driver.conf x dsbulk-1.11.0/conf/application.conf x dsbulk-1.11.0/THIRD-PARTY.txt x dsbulk-1.11.0/lib/java-driver-core-4.17.0.jar x dsbulk-1.11.0/lib/native-protocol-1.5.1.jar x dsbulk-1.11.0/lib/netty-handler-4.1.94.Final.jar . . . x dsbulk-1.11.0/lib/lz4-java-1.8.0.jar x dsbulk-1.11.0/lib/snappy-java-1.1.7.3.jar x dsbulk-1.11.0/lib/jansi-1.18.jar
-
To verify the installation, run the following command in the same directory where you extracted DSBulk:
dsbulk-VERSION/bin/dsbulk --version
Results
DataStax Bulk Loader v1.11.0
-
Create an application token with the Administrator User role, and then store the token securely.
-
If you haven’t done so already, create a database.
-
Download the database’s secure connect bundle.
-
Create a table in your database:
-
In the Astra Portal, go to your database, and click CQL Console.
-
When the
token@cqlsh>
prompt appears, select the keyspace where you want to create the table:use KEYSPACE_NAME;
-
Create a table to load a sample dataset:
CREATE TABLE KEYSPACE_NAME.world_happiness_report_2021 ( country_name text, regional_indicator text, ladder_score float, gdp_per_capita float, social_support float, healthy_life_expectancy float, generosity float, PRIMARY KEY (country_name) );
If you want to load your own data, replace
world_happiness_report_2021
with your table name, and then adjust the column names and data types for your data.
-
-
To load the sample dataset, download the World Happiness Report 2021 sample dataset. This is a small sample dataset, but DSBulk can load, unload, and count extremely large files.
-
Use DSBulk to load data into the table:
dsbulk-VERSION/bin/dsbulk load -url PATH_TO_CSV_FILE -k KEYSPACE_NAME \ -t TABLE_NAME -b PATH_TO_SECURE_CONNECT_BUNDLE -u token \ -p APPLICATION_TOKEN
Results
Operation directory: /path/to/directory/log/LOAD ... total | failed | rows/s | p50ms | p99ms | p999ms | batches 149 | 0 | 400 | 106.65 | 187.70 | 191.89 | 1.00
-
After the upload completes, you can query the loaded data from the CQL Console:
SELECT * FROM KEYSPACE_NAME.worldhappinessreport2021;