Architecture

The DataStax Bulk Loader workflow engine architecture describes load/unload operations.

The DataStax Workflow Engine is the component responsible for the orchestration of loading and unloading operations. The main features are:
  • Configuration: The engine collects user-supplied settings, merges them with default values and configures the loading/unloading operation to run.
  • Connection: The engine handles the driver connection to DSE and manages driver-specific settings, as well as supports authentication and SSL encryption.
  • Conversion: The engine handles data type conversions, e.g. boolean, number, date conversions from anything (typically, strings or raw bytes as emitted by a connector) to appropriate internal representations (typically, Java Temporal or Number objects). It also handles NULL and UNSET values.
  • Mapping: The engine analyzes metadata gathered from the driver and infers the appropriate INSERT or SELECT prepared statement, then checks this information against user-supplied information about the data source, to infer the bound variables to use.
  • Monitoring: The engine reports metrics about all its internal components, mainly the connector and the bulk executor.
  • Error Handling: The engine handles errors from both connectors and the bulk executor, and reports read, parse, and write failures. These are redirected to a configurable "bad file" that contains sources that could not be loaded.

Figure: Loading Workflow

Figure: Unloading Workflow