Upload data to your database

After connecting to your database, you’ll want to upload data to build and test your applications. Use DataStax Bulk Loader (dsbulk) and the DataStax Apache Kafka Connector to upload data to your Astra DB database.

Astra DB Data Loader

Astra DB conveniently has its own data loader built in to the user interface. Use this DataStax Astra DB Data Loader to load your own data into your database or try one of our sample datasets.

  1. From your Astra DB Dashboard, select Load Data for the database where you want to load data.

    Astra DB console dashboard with Load Data action shown.

    The Astra DB Data Loader launches.

    Astra DB Data Loader screen with three options displayed.
  2. Load your data using one of the options:

    • Option 1: Upload your own dataset.

      Drag and drop your own .csv file into the Astra DB Data Loader.

      CSV files must be less than 40 MB. You will see a status bar to show how much data has uploaded. Ensure the column names in your .csv do not include spaces. Underscores are accepted. For example, ShoeSize, ShirtColor, Shoe_Size, and Shirt_Color are accepted column names.

    • Option 2: Load an example dataset.

      Select one of the two examples given to use as a sample dataset.

    • Option 3: Load data from an Amazon S3 bucket that contains exported DynamoDB data.

      First, export your DynamoDB data to S3 as described here. Then in AWS console, grant read access to the following ARN: arn:aws:iam::445559476293:role/astra-loader. Your bucket policy should use:

      { "Statement": [ { "Action": [ "s3:ListBucket", "s3:GetBucketLocation" ], "Principal": { "AWS": "arn:aws:iam::445559476293:role/astra-loader" }, "Effect": "Allow", "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME" }, { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::445559476293:role/astra-loader" }, "Action": [ "s3:GetObject" ], "Resource": "arn:aws:s3:::YOUR_BUCKET_NAME/*" } ] }

      This bucket policy allows Astra DB automation to pull data from your identified shared S3 bucket, and load the data into Astra DB. You can remove the permission after the data load finishes.

      In the Option 3 prompts, enter your S3 Bucket name, and enter the Key value. To find the Key, navigate in AWS console to the S3 subdirectory that contains your exported DynamoDB data. Look for the Key on its Properties tab. Here’s a sample screen with the Key shown near the lower-left corner:

      S3 Properties with Key value for exported DynamoDB data file.
  3. Once you configure your option, select Next.

  4. Give your table for this dataset a name.

    Your dataset will be included in the Data Preview and Types.

    dataloader config
  5. Select the data type for each column.

    The Astra DB Data Loader automatically selects data types for your dataset. If needed, you can change this to your own selection.

  6. Select your partition key and clustering column for your data.

    dataloader keyscluster
  7. Select Next.

  8. Select your database from the dropdown menu.

  9. Select your keyspace from the available keyspaces.

    dataloader loadtotarget
  10. Select Next.

You will see a confirmation that your data is being imported. Within a few minutes, your dataset will begin uploading to your database.

You will receive an email when the job has started and when the dataset has been loaded. Then you can interact with your data by connecting to your database.

Loading data with DSBulk

Introduction

Here is a quick overview of how to get started with DataStax Bulk Loader and Astra DB. This topic provides the necessary steps to load your CSV data into an Astra DB database via a dsbulk load command.

Another option is to use the Load Data feature, which is available for an existing database in Astra DB console. You can load CSV data up to 40 MB through the UI. Start on the Astra DB Dashboard and click the link for your database. Follow the Load Data dialog to select your CSV file and have its data loaded into the database keyspace and table that you specify. See Astra Data Loader. Example:

Astra Data Loader with CSV upload option

DSBulk install

  1. From your desktop in the terminal, download the dsbulk installation file:

    curl -OL https://downloads.datastax.com/dsbulk/dsbulk-1.9.0.tar.gz

    Results

      % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                     Dload  Upload   Total   Spent    Left  Speed
    100 31.5M  100 31.5M    0     0  8514k      0  0:00:03  0:00:03 --:--:-- 8512k
  2. Unpack the folder:

    tar -xzvf dsbulk-1.9.0.tar.gz

    Then access the dsbulk executable through the bin folder:

    3 binfolder
  3. Make sure that everything is running correctly through the command line:

    dsbulk-1.9.0/bin/dsbulk --version

    Result

    DataStax Bulk Loader v1.9.0

DSBulk version 1.9.0 is installed and ready to use.

Astra Credentials

Before you can run DSBulk, get the necessary credentials to connect to Astra DB. To run DSBulk, we need the Client ID, Client Secret and Secure Connect Bundle.

  1. In Astra DB console, navigate to your Organization Settings.

  2. Select Token Management.

  3. From the Select Role dropdown menu, select Administrator User.

    generate tokens
  4. Generate a token for your Administrator User role by selecting Generate Token.

    You will be provided with the Client ID, Client Secret, and a Token. For the dsbulk load command shown later in this topic, you’ll need the Client ID and Client Secret values.

  5. Select Download Token Details to store these credentials locally.

  6. Navigate to your Dashboard Overview by clicking on the "DataStax Astra" icon in the top banner.

  7. Create a database, if you haven’t already.

  8. For one of your active databases, select the Connect tab. On the Connect page, to reveal the Download Bundle option, click any of the Driver types. Example:

    Astra Secure Bundle option is shown a Connect Drivers page.

  9. Download the Secure Connect Bundle to store it locally.

Create keyspace and table

To load your data with DataStax Bulk Loader, you need to create a keyspace and table.

  1. Navigate to your Dashboard Overview for your existing database.

  2. Select Add Keyspace and name it test, as used in the following example cqlsh commands.

    8 addkeyspace
  3. Create a table through the CQL Console in your database:

    9 CQL
    CREATE TABLE test.world_happiness_report_2021 (
      country_name text,
      regional_indicator text,
      ladder_score float,
      gdp_per_capita float,
      social_support float,
      healthy_life_expectancy float,
      generosity float,
      PRIMARY KEY (country_name)
    );

    For more, see CREATE TABLE.

  4. Run desc tables; to confirm the new table exists:

    The results will include:

    world_happiness_report_2021

The world_happiness_report_2021 table has been successfully created.

Load your data

With your keyspace and table set up, you can upload your data.

If you want to use sample data, check out this sample CSV file: World Happiness Report 2021.

  1. Load your table using DataStax Bulk Loader. Here’s the command format:

    dsbulk-1.9.0/bin/dsbulk load -url <path-to-csv-file> -k <keyspace_name> -t <table_name> -b <path-to-secure-connect-bundle> -u <client_id> -p <client_secret>

    Results:

    Operation directory: /path/to/directory/log/LOAD ...
    total | failed | rows/s |  p50ms |  p99ms | p999ms | batches
      149 |      0 |    400 | 106.65 | 187.70 | 191.89 |    1.00

Success! Your rows were loaded into the table. This is a small test sample size, but DataStax Bulk Loader can load, unload, and count extremely large files.

View your data in Astra DB

Now, all that is left is to view the data in the Astra DB console.

  1. Navigate back to the CQL Console tab in Astra DB console.

  2. Run the following command to see the output:

    select * from test.world_happiness_report_2021;

    Results include:

    13 cqloutput

DataStax Apache Kafka Connector

Deploy the DataStax Apache Kafka™ Connector to stream records from an Apache Kafka topic to your DataStax Astra DB database.

The DataStax Apache Kafka Connector download package includes a sample JSON properties file (dse-sink-distributed.json.sample). Use the sample file as a reference when configuring your deployment. The dse-sink-distributed.json.sample file is located in the conf directory of the DataStax Apache Kafka Connector distribution package.

Prerequisites

  • Download and install the DataStax Apache Kafka Connector.

  • Configure the distributed worker configuration file connect-distributed.properties to fit your needs. Use this example from DataStax as a starting point. Specify the converter for the key.converter and value.converter properties that matches the form of your Kafka data. See Configuring converters in the Confluent documentation for more information on these properties.

Procedure

  1. From the directory where you installed Apache Kafka, start the distributed worker:

bin/connect-distributed.sh config/connect-distributed.properties

The worker startup process outputs a large number of informational messages. The following message displays after the process completes: [2019-10-13 19:49:25,385] INFO Finished starting connectors and tasks (org.apache.kafka.connect.runtime.distributed.DistributedHerder:852) . Configure the JSON configuration file (such as dse-sink.json) to use the Astra DB secure connect bundle.

{ "name": "dse-sink",     "config":
  { "connector.class": "com.datastax.kafkaconnector.DseSinkConnector",
    "cloud.secureConnectBundle": "/path/to/secure-connect-database-name.zip",
    "auth.username": "clientId",
    "auth.password": "clientSecret" ...
  }
}
  • name: Unique name for the connector. Default: dse-sink

  • connector.class: DataStax connector Java class provided in the kafka-connect-dse-N.N.N.jar. Default: com.datastax.kafkaconnector.DseSinkConnector

  • cloud.secureConnectBundle: The full path to the secure connect bundle for your Astra DB database (secure-connect-database_name.zip).

Download the secure connect bundle from the Astra DB console. If this option is specified, you must also include the auth.username and auth.password for the database user.

  • auth.username: Astra DB database username

When authorization is enabled, the DataStax connector login role must have a minimum of modify privileges on tables receiving data from the DataStax Apache Kafka® Connector.

  • auth.password: Astra DB database password for the specified username

  1. Register the connector configuration with the distributed worker:

curl -X POST -H "Content-Type: application/json" -d @dse-sink.json "http://ip:port/connectors"

ip and port are the IP address and port number of the Kafka worker. Use the same port as the rest.port parameter set in connect-distributed.properties. The default port is 8083.

You configured the dse-sink.json or dse-sink.properties file when installing the DataStax Apache Kafka Connector.

Test loading data with NoSqlBench

Use NoSQLBench to test loading data to Astra with a demo cluster, keyspace, and user.

Prerequisites

  1. Ensure you have a working NoSQLBench (nb or nb.jar). For more, see Getting NoSQLBench.

  1. Make sure your Astra secure connect bundle is available to your nosqlbench instance. You can download the bundle from Astra and copy it to your /tmp directory.

  1. Get your Client ID and Client Secret by creating your application token for your username and password.

Start Astra Benchmarking with NoSQLBench

nb cql-iot astra \
username=clientId \
password=clientSecret \
keyspace=demo \
secureconnectbundle=/tmp/secure-connect-demo.zip

or

java -jar nb.jar cql-iot astra \
username=clientId \
password=clientSecret \
keyspace=demo \
secureconnectbundle=/tmp/secure-connect-demo.zip

Sample Output

logging to logs/scenario_20201130_132903_455.log
cqliot_astra_schema: 100.00%/Stopped (details: min=0 cycle=1 max=1) (last report)
cqliot_astra_rampup: 0.37%/Stopped (details: min=0 cycle=36756 max=10000000)
cqliot_astra_rampup: 0.88%/Stopped (details: min=0 cycle=88159 max=10000000)
cqliot_astra_rampup: 1.40%/Stopped (details: min=0 cycle=139760 max=10000000)
cqliot_astra_rampup: 1.92%/Stopped (details: min=0 cycle=192445 max=10000000)
cqliot_astra_rampup: 2.44%/Stopped (details: min=0 cycle=243613 max=10000000)
cqliot_astra_rampup: 2.97%/Stopped (details: min=0 cycle=297122 max=10000000)
cqliot_astra_rampup: 3.51%/Stopped (details: min=0 cycle=350615 max=10000000)
cqliot_astra_rampup: 4.02%/Stopped (details: min=0 cycle=402398 max=10000000)
cqliot_astra_rampup: 4.45%/Stopped (details: min=0 cycle=444902 max=10000000)
cqliot_astra_rampup: 4.99%/Stopped (details: min=0 cycle=499381 max=10000000)
cqliot_astra_rampup: 5.53%/Stopped (details: min=0 cycle=552956 max=10000000)
cqliot_astra_rampup: 6.08%/Stopped (details: min=0 cycle=607688 max=10000000)
cqliot_astra_rampup: 6.56%/Stopped (details: min=0 cycle=656076 max=10000000)
cqliot_astra_rampup: 7.04%/Stopped (details: min=0 cycle=704018 max=10000000)
cqliot_astra_rampup: 7.55%/Stopped (details: min=0 cycle=754986 max=10000000)
cqliot_astra_rampup: 8.10%/Stopped (details: min=0 cycle=809898 max=10000000)
cqliot_astra_rampup: 8.89%/Stopped (details: min=0 cycle=888542 max=10000000)
cqliot_astra_rampup: 8.89%/Stopped (details: min=0 cycle=888542 max=10000000)
cqliot_astra_rampup: 9.19%/Stopped (details: min=0 cycle=919438 max=10000000)
cqliot_astra_rampup: 9.71%/Stopped (details: min=0 cycle=970703 max=10000000)
cqliot_astra_rampup: 10.25%/Stopped (details: min=0 cycle=1025088 max=10000000)
cqliot_astra_rampup: 10.71%/Stopped (details: min=0 cycle=1070983 max=10000000)
cqliot_astra_rampup: 11.20%/Stopped (details: min=0 cycle=1120478 max=10000000)
cqliot_astra_rampup: 11.68%/Stopped (details: min=0 cycle=1168035 max=10000000)
cqliot_astra_rampup: 12.19%/Stopped (details: min=0 cycle=1219462 max=10000000)

What’s Next?

Be sure to check out other Astra workloads (cql-tabular and cql-keyvalue). You can find more information about other Astra workloads with the following command:

nb --list-scenarios

Look for “astra” in output.