Architecture
The DataStax workflow engine is the key architectural component responsible for the orchestration of DSBulk Loader operations.
The main features are:
-
Configuration: The workflow engine collects user-supplied settings, merges them with default values and configures the loading/unloading operation to run.
-
Connection: The workflow engine handles the driver connection to:
-
Hyper-Converged Database (HCD) 1.0 databases
-
DataStax Enterprise (DSE) 5.1, 6.8, and 6.9 databases
-
Open source Apache Cassandra® 2.1 and later databases
-
The workflow engine manages the driver-specific settings, as well as supports authentication and SSL encryption.
-
Conversion: The engine handles data type conversions, e.g. boolean, number, date conversions from anything (typically, strings or raw bytes as emitted by a connector) to appropriate internal representations (typically, Java Temporal or Number objects). It also handles
NULL
andUNSET
values. -
Mapping: The engine analyzes metadata gathered from the driver and infers the appropriate
INSERT
orSELECT
prepared statement, then checks this information against user-supplied information about the data source, to infer the bound variables to use. -
Monitoring: The engine reports metrics about all its internal components, mainly the connector and the bulk executor.
-
Error Handling: The engine handles errors from both connectors and the bulk executor, and reports read, parse, and write failures. These are redirected to a configurable "bad file" that contains sources that could not be loaded.