• Glossary
  • Support
  • Downloads
  • DataStax Home
Get Live Help
Expand All
Collapse All

DataStax Astra DB Serverless Documentation

    • Overview
      • Release notes
      • Astra DB FAQs
      • Astra DB glossary
      • Get support
    • Getting Started
      • Grant a user access
      • Load and retrieve data
        • Use DSBulk to load data
        • Use Data Loader in Astra Portal
      • Connect a driver
      • Build sample apps
      • Use integrations
    • Planning
      • Plan options
      • Database regions
    • Securing
      • Security highlights
      • Security guidelines
      • Default user permissions
      • Change your password
      • Reset your password
      • Authentication and Authorization
      • Astra DB Plugin for HashiCorp Vault
    • Connecting
      • Connecting private endpoints
        • AWS Private Link
        • Azure Private Link
        • GCP Private Endpoints
        • Connecting custom DNS
      • Connecting Change Data Capture (CDC)
      • Connecting CQL console
      • Connect the Spark Cassandra Connector to Astra
      • Drivers for Astra DB
        • Connecting C++ driver
        • Connecting C# driver
        • Connecting Java driver
        • Connecting Node.js driver
        • Connecting Python driver
        • Drivers retry policies
      • Connecting Legacy drivers
      • Get Secure Connect Bundle
    • Migrating
      • FAQs
      • Preliminary steps
        • Feasibility checks
        • Deployment and infrastructure considerations
        • Create target environment for migration
        • Understand rollback options
      • Phase 1: Deploy ZDM Proxy and connect client applications
        • Set up the ZDM Proxy Automation with ZDM Utility
        • Deploy the ZDM Proxy and monitoring
        • Configure Transport Layer Security
        • Connect client applications to ZDM Proxy
        • Leverage metrics provided by ZDM Proxy
        • Manage your ZDM Proxy instances
      • Phase 2: Migrate and validate data
      • Phase 3: Enable asynchronous dual reads
      • Phase 4: Change read routing to Target
      • Phase 5: Connect client applications directly to Target
      • Troubleshooting
        • Troubleshooting tips
        • Troubleshooting scenarios
      • Glossary
      • Contribution guidelines
      • Release Notes
    • Managing
      • Managing your organization
        • User permissions
        • Pricing and billing
        • Audit Logs
        • Bring Your Own Key
          • BYOK AWS Astra DB console
          • BYOK GCP Astra DB console
          • BYOK AWS DevOps API
          • BYOK GCP DevOps API
        • Configuring SSO
          • Configure SSO for Microsoft Azure AD
          • Configure SSO for Okta
          • Configure SSO for OneLogin
      • Managing your database
        • Create your database
        • View your databases
        • Database statuses
        • Use DSBulk to load data
        • Use Data Loader in Astra Portal
        • Monitor your databases
        • Export metrics to third party
          • Export metrics via Astra Portal
          • Export metrics via DevOps API
        • Manage access lists
        • Manage multiple keyspaces
        • Using multiple regions
        • Terminate your database
      • Managing with DevOps API
        • Managing database lifecycle
        • Managing roles
        • Managing users
        • Managing tokens
        • Managing BYOK AWS
        • Managing BYOK GCP
        • Managing access list
        • Managing multiple regions
        • Get private endpoints
        • AWS PrivateLink
        • Azure PrivateLink
        • GCP Private Service
    • Astra CLI
    • Astra Block
      • Quickstart
      • FAQ
      • Data model
      • About NFTs
    • Developing with Stargate APIs
      • Develop with REST
      • Develop with Document
      • Develop with GraphQL
        • Develop with GraphQL (CQL-first)
        • Develop with GraphQL (Schema-first)
      • Develop with gRPC
        • gRPC Rust client
        • gRPC Go client
        • gRPC Node.js client
        • gRPC Java client
      • Develop with CQL
      • Tooling Resources
      • Node.js Document API client
      • Node.js REST API client
    • Stargate QuickStarts
      • Document API QuickStart
      • REST API QuickStart
      • GraphQL API CQL-first QuickStart
    • API References
      • DevOps REST API v2
      • Stargate Document API v2
      • Stargate REST API v2
  • DataStax Astra DB Serverless Documentation
  • Getting Started
  • Load and retrieve data
  • Use DSBulk to load data

Use DSBulk to load data

Introduction

Here is a quick overview of how to get started with DataStax Bulk Loader and Astra DB. This topic provides the necessary steps to load your CSV data into an Astra DB database via a dsbulk load command.

Another option is to use the Data Loader feature, which is available for an existing database in Astra Portal. You can load CSV data up to 40 MB through the UI. The Data Loader feature includes additional options to load data from the provided sample datasets; or load data from your Amazon S3 bucket that contains exported DynamoDB data. See Use Astra DB Data Loader.

Start on the Astra DB Dashboard and click the link for your database. Follow the Load Data dialog to select your CSV file and have its data loaded into the database keyspace and table that you specify. Example:

Astra Data Loader with CSV upload option

DSBulk install

  1. From your desktop in the terminal, download the dsbulk installation file:

    curl -OL https://downloads.datastax.com/dsbulk/dsbulk-1.9.0.tar.gz

    Results

      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100 31.5M  100 31.5M    0     0  8514k      0  0:00:03  0:00:03 --:--:-- 8512k
  2. Unpack the folder:

    tar -xzvf dsbulk-1.9.0.tar.gz

    Then access the dsbulk executable through the bin folder:

    3 binfolder
  3. Make sure that everything is running correctly through the command line:

    dsbulk-1.9.0/bin/dsbulk --version

    Result

    DataStax Bulk Loader v1.9.0

DSBulk version 1.9.0 is installed and ready to use.

Astra Credentials

Before you can run DSBulk, get the necessary credentials to connect to Astra DB. To run DSBulk, we need the Client ID, Client Secret and Secure Connect Bundle.

  1. In Astra Portal, select Settings in the left navigation.

  2. Select Token Management.

  3. From the Select Role dropdown menu, select Administrator User.

    1500
  4. Generate a token for your Administrator User role by selecting Generate Token.

    You will be provided with the Client ID, Client Secret, and a Token. For the dsbulk load command shown later in this topic, you’ll need the Client ID and Client Secret values.

  5. Select Download Token Details to store these credentials locally.

  6. Navigate to your Dashboard Overview by clicking on the "DataStax Astra" icon in the top banner.

  7. Create a database, if you haven’t already.

  8. For one of your active databases, select the Connect tab. On the Connect page, to reveal the Download Bundle option, click any of the Driver types. Example:

    1500
  9. Download the Secure Connect Bundle to store it locally.

Create keyspace and table

To load your data with DataStax Bulk Loader, you need to create a keyspace and table.

  1. Open your Astra account and select a database to open.

  2. Navigate to your Dashboard Overview for your existing database.

  3. Select Add Keyspace and name it test, as used in the following example cqlsh commands.

    1500
  4. Create a table through the CQL Console in your database:

    9 CQL
    CREATE TABLE test.world_happiness_report_2021 (
      country_name text,
      regional_indicator text,
      ladder_score float,
      gdp_per_capita float,
      social_support float,
      healthy_life_expectancy float,
      generosity float,
      PRIMARY KEY (country_name)
    );

    For more, see CREATE TABLE.

  5. Run desc tables; to confirm the new table exists:

    The results will include:

    world_happiness_report_2021

The world_happiness_report_2021 table has been successfully created.

Load your data

With your keyspace and table set up, you can upload your data.

If you want to use sample data, check out this sample CSV file: World Happiness Report 2021.

  1. Load your table using DataStax Bulk Loader. Here’s the command format:

    dsbulk-1.9.0/bin/dsbulk load -url <path-to-csv-file> -k <keyspace_name> -t <table_name> -b <path-to-secure-connect-bundle> -u <client_id> -p <client_secret>

    Results:

    Operation directory: /path/to/directory/log/LOAD ...
    total | failed | rows/s |  p50ms |  p99ms | p999ms | batches
      149 |      0 |    400 | 106.65 | 187.70 | 191.89 |    1.00

Success! Your rows were loaded into the table. This is a small test sample size, but DataStax Bulk Loader can load, unload, and count extremely large files.

View your data in Astra DB

Now, all that is left is to view the data in Astra Portal.

  1. Navigate back to the CQL Console tab in Astra Portal.

  2. Run the following command to see the output:

    select * from test.world_happiness_report_2021;

    Results include:

    13 cqloutput
Load and retrieve data Use Data Loader in Astra Portal

General Inquiries: +1 (650) 389-6000 info@datastax.com

© DataStax | Privacy policy | Terms of use

DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Kubernetes is the registered trademark of the Linux Foundation.

landing_page landingpage