Architecture

The DataStax Bulk Loader workflow engine architecture describes load and unload operations.

The DataStax Workflow Engine is the component responsible for the orchestration of loading and unloading operations. The main features are:
  • Configuration: The engine collects user-supplied settings, merges them with default values and configures the loading/unloading operation to run.
  • Connection: The engine handles the driver connection to DataStax Enterprise (DSE) or DataStax Distribution of Apache Cassandra™ (DDAC), and manages driver-specific settings, as well as supports authentication and SSL encryption.
  • Conversion: The engine handles data type conversions, e.g. boolean, number, date conversions from anything (typically, strings or raw bytes as emitted by a connector) to appropriate internal representations (typically, Java Temporal or Number objects). It also handles NULL and UNSET values.
  • Mapping: The engine analyzes metadata gathered from the driver and infers the appropriate INSERT or SELECT prepared statement, then checks this information against user-supplied information about the data source, to infer the bound variables to use.
  • Monitoring: The engine reports metrics about all its internal components, mainly the connector and the bulk executor.
  • Error Handling: The engine handles errors from both connectors and the bulk executor, and reports read, parse, and write failures. These are redirected to a configurable "bad file" that contains sources that could not be loaded.
Figure: Loading Workflow
Workflow to load JSON or CSV data into a DSE or DDAC database.
Figure: Unloading Workflow
Workflow to unload data from a DSE or DDAC database to a JSON or CSV file.