Engine options
Specify engine options for the dsbulk
command.
The options can be used in short form (-k keyspace_name
) or long form (--schema.keyspace keyspace_name
).
-dryRun, --engine.dryRun, --dsbulk.engine.dryRun { true | false }
Enable or disable dry-run mode, a test mode that runs the command but does not load data. Not applicable for unloading.
Default: false
--engine.executionId, --dsbulk.engine.executionId string
A unique identifier to attribute to each execution. When unspecified or empty, the engine will automatically generate identifiers of the following form: workflow_timestamp, where :
-
workflow stands for the workflow type (
LOAD
,UNLOAD
, etc.); -
timestamp is the current timestamp formatted as
uuuuMMdd-HHmmss-SSSSSS
(see Patterns for Formatting and Parsing in Oracle Java documentation) in UTC, with microsecond precision if available, and millisecond precision otherwise. When this identifier is user-supplied, it is important to guarantee its uniqueness; failing to do so may result in execution failures. It is also possible to provide templates here. Any format compliant with the formatting rules ofString.format()
is accepted, and can contain the following parameters: -
%1$s
: the workflow type (LOAD
,UNLOAD
, etc.); -
%2$t
: the current time (with microsecond precision if available, and millisecond precision otherwise); -
%3$s
: the JVM process PID (this parameter might not be available on some operating systems; if its value cannot be determined, a random integer will be inserted instead). Default:null
-maxConcurrentQueries, --engine.maxConcurrentQueries, --dsbulk.engine.maxConcurrentQueries string
The maximum number of concurrent queries that should be carried in parallel.
This option acts as a safeguard to prevent more queries executing in parallel than the cluster can handle, or to regulate throughput when latencies get too high. Batch statements count as one query.
When using continuous paging, also make sure to set this number to a value equal to or lesser than the number of nodes in the local datacenter multiplied by the value configured server-side for continuous_paging.max_concurrent_sessions in the cassandra.yaml configuration file (60 by default); otherwise some requests might be rejected.
The special syntax NC can be used to specify a number that is a multiple of the number of available cores.
For example, if the number of cores is 8, then 0.5C = 0.5 * 8 = 4
concurrent queries.
The default value is AUTO
.
With this special value, DataStax Bulk Loader® optimizes the number of concurrent queries according to the number of available cores, and the operation being executed.
The actual value usually ranges from the number of cores to eight times that number.
Starting in 1.6.0, using |
-
dsbulk.executor.maxInFlight
-
dsbulk.executor.maxPerSecond
Those two settings are still supported.
However, their default values changed in 1.6.0 to -1
(disabled).
The settings create semaphores and thus block the driver under high contention.
The new setting, --dsbulk.engine.maxConcurrentQueries
, achieves the same effect without blocking the driver.
Also,the setting executor.continuousPaging.maxConcurrentQueries
is deprecated.
Instead, use engine.maxConcurrentQueries
.
If executor.continuousPaging.maxConcurrentQueries
is provided, DataStax Bulk Loader 1.6.0 and later ignores it and logs a warning.
To check the current engine.maxConcurrentQueries
setting, set logging -verbosity 2
. See -verbosity, --log.verbosity, --dsbulk.log.verbosity { 0 | 1 | 2 } for option details. Then look in the operation.log
file for a line starting with Using read concurrency:
or Using write concurrency:
.
Default: AUTO