• Glossary
  • Support
  • Downloads
  • DataStax Home
Get Live Help
Expand All
Collapse All

DataStax Bulk Loader

    • About DataStax Bulk Loader
    • Release notes
    • Architecture
    • Installing
    • Getting Started
      • Loading data
      • Unloading data
      • Counting data in tables
      • Creating configuration files
      • Loading tables that contain static and non-static columns
      • Using SSL with dsbulk
      • Printing cluster information
    • Kerberos client authentication
    • Reference
      • dsbulk
        • Loading data examples
        • Unloading data examples
        • Counting data example
        • Exit codes
      • Common options
      • Connector options
      • Count options
      • Schema options
      • Batch options
      • Codec options
      • Driver options
      • Engine options
      • Executor options
      • Logging options
      • Monitoring options
  • DataStax Bulk Loader
  • Architecture

Architecture

The DataStax Bulk Loader workflow engine architecture describes load and unload operations.

The DataStax Workflow Engine is the component responsible for the orchestration of loading and unloading operations. The main features are:

  • Configuration: The engine collects user-supplied settings, merges them with default values and configures the loading/unloading operation to run.

  • Connection: The engine handles the driver connection to:

    • DataStax Astra cloud databases

    • DataStax Enterprise (DSE) 4.7 and later databases

    • Open source Apache Cassandra® 2.1 and later databases The engine manages the driver-specific settings, as well as supports authentication and SSL encryption.

  • Conversion: The engine handles data type conversions, e.g. boolean, number, date conversions from anything (typically, strings or raw bytes as emitted by a connector) to appropriate internal representations (typically, Java Temporal or Number objects). It also handles NULL and UNSET values.

  • Mapping: The engine analyzes metadata gathered from the driver and infers the appropriate INSERT or SELECT prepared statement, then checks this information against user-supplied information about the data source, to infer the bound variables to use.

  • Monitoring: The engine reports metrics about all its internal components, mainly the connector and the bulk executor.

  • Error Handling: The engine handles errors from both connectors and the bulk executor, and reports read, parse, and write failures. These are redirected to a configurable "bad file" that contains sources that could not be loaded.

Loading Workflow

Workflow to load JSON or CSV data into a DSE

Unloading Workflow

Workflow to unload data from a DSE

Release notes Installing

General Inquiries: +1 (650) 389-6000 info@datastax.com

© DataStax | Privacy policy | Terms of use

DataStax, Titan, and TitanDB are registered trademarks of DataStax, Inc. and its subsidiaries in the United States and/or other countries.

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.

Kubernetes is the registered trademark of the Linux Foundation.

landing_page landingpage