Driver options
The Cassandra Java driver is a dependency of DSBulk and it is included with your DSBulk installation. DSBulk version 1.11.0 uses version 4.17.0 of the Java driver.
This topic describes common and notable Java driver options that you can specify on the DSBulk command line or in configuration files. Many more options exist. For more information, see the DataStax Java driver version 4.17 documentation.
If you are using an earlier version of DSBulk, refer to the documentation for the driver version included with that release, as specified in the DSBulk pom.xml.
Synopsis
The driver options in DSBulk are derived from the Java driver itself.
When passing driver options to DSBulk on the command line, the standard form is --driver. followed by one or more segments, such as --driver.basic.contact-points or --driver.basic.load-balancing-policy.filter.allow.
For the values of DSBulk options, HOCON syntax rules apply unless otherwise noted. For more information, see Escape and quote DSBulk command line arguments.
Short and long forms
On the command line, you can specify options in short form (if available), standard form, or long form.
For all driver options, the long form is prefixed by datastax-java-driver instead of driver, such as --datastax-java-driver.basic.contact-points.
The following examples show the same command with different forms of the driver.basic.contact-points option:
# Short form
dsbulk load -h '[ "192.168.0.1", "192.168.0.2" ]' -url filename.csv -k ks1 -t table1
# Standard form
dsbulk load --driver.basic.contact-points '[ "192.168.0.1", "192.168.0.2" ]' -url filename.csv -k ks1 -t table1
# Long form
dsbulk load --datastax-java-driver.basic.contact-points '[ "192.168.0.1", "192.168.0.2" ]' -url filename.csv -k ks1 -t table1
In configuration files, you must use the long form with the datastax-java-driver prefix.
For example:
datastax-java-driver.basic.contact-points = '[ "192.168.0.1", "192.168.0.2" ]'
Executor options
The executor options are used to configure the Java driver.
However, these aren’t datastax-java-driver options.
The standard form for these options is --executor.*, and the long form is --dsbulk.executor.*.
The only active executor option is --executor.continuousPaging.enabled.
Other executor options are deprecated.
Basic connection options
These are the core driver connection options. These options specify the type of cluster (self-managed or Astra DB), contact points, and ports.
See also Authentication options and SSL options.
--driver.basic.cloud.secure-connect-bundle (-b)
This option is for Astra DB only.
For self-managed clusters, use --driver.basic.contact-points and --driver.basic.default-port.
When connecting to an Astra DB database, provide the location of your database’s Secure Connect Bundle (SCB). The SCB contains the necessary contact points, certificates, and keys to establish an SSL-encrypted connection to your database. The driver extracts this information automatically.
The specified location must be a path on the local filesystem or a valid URL. For example:
-b "/path/to/SCB.zip" # Absolute path
-b "./path/to/SCB.zip" # Relative path from the current working directory
-b "~/path/to/SCB.zip" # Shorthand path from the user's $HOME directory
-b "C:\\path\\to\\SCB.zip" # Microsoft Windows path with escaped backslashes
-b "file:/path/to/SCB.zip" # URL with file protocol
-b "http://host.com/SCB.zip" # URL with HTTP protocol
|
When connecting to Astra DB, the following options are ignored or unsupported because they are inferred from the SCB or not applicable to Astra DB:
|
Default: null
--driver.basic.contact-points (-h)
When connecting to a self-managed cluster (DSE, HCD, or open-source Apache Cassandra®), specify the contact points to use for the initial connection to the cluster.
Contact points are addresses of cluster nodes that the driver uses to discover the cluster topology. Only one contact point is required because the driver can retrieve the address of the other nodes automatically. However, DataStax recommends that you provide multiple contact points in case a node is unavailable. If there is only one contact point, and it is unavailable, then the driver cannot initialize correctly.
The default value depends on the type of cluster:
-
For self-managed clusters, the default is
["127.0.0.1:9042"]. -
For Astra DB, DSBulk automatically sets
--driver.basic.contact-pointsto an empty list ([]) because the contact points are inferred from--driver.basic.cloud.secure-connect-bundle.
Contact points syntax
Provide contact points as a list of strings in the format of host or host:port:
# IPv4 addresses with or without ports
-h '["192.168.0.1:9042","192.168.0.2:9042"]'
-h '["192.168.0.1","192.168.0.2"]'
# IPv6 addresses with or without ports
-h '["fe80:0:0:0:f861:3eff:fe1d:9d7b:9042","fe80:0:0:f861:3eff:fe1d:9d7b:9044:9042"]'
-h '["fe80:0:0:0:f861:3eff:fe1d:9d7b","fe80:0:0:f861:3eff:fe1d:9d7b:9044"]'
# Host names with or without ports
-h '["host1.com:9042","host2.com:9042"]'
-h '["host1.com","host2.com"]'
If a host is specified without a port, then the driver uses the default port (--driver.basic.default-port).
Address strings that contain special characters must be wrapped in quotes, such as -h '["fe80::f861:3eff:fe1d:879e%en0"]'.
Avoid ambiguous address resolution
If the host is a DNS name that resolves to multiple A-records, then all the corresponding addresses are used.
Don’t use localhost as a host name because it resolves to both IPv4 and IPv6 addresses on some platforms.
The heuristic to determine whether a contact point is in the form host or host:port isn’t guaranteed to be accurate for some IPv6 addresses.
Avoid ambiguous IPv6 address strings such as fe80::f861:3eff:fe1d:1234 that could be interpreted as a combination of IP fe80::f861:3eff:fe1d with port 1234, or as IP fe80::f861:3eff:fe1d:1234 without a port.
In such cases, DSBulk doesn’t change the contact point.
To avoid this issue, provide IPv6 addresses in full form.
For example, instead of fe80::f861:3eff:fe1d:1234, use fe80:0:0:0:0:f861:3eff:fe1d:1234 so the string is parsed as IP fe80:0:0:0:0:f861:3eff:fe1d with port 1234.
--driver.basic.default-port (-port)
The port to use for any host without a specified port in --driver.basic.contact-points.
Cassandra 3.0 and earlier and DSE 6.7 and earlier require all nodes in a cluster to share the same port.
Default: 9042
Advanced connection options
Use the following options to configure connection pool settings, including sizes, limits, and timeouts.
--driver.advanced.address-translator.class
Use this option to set the address translator class.
If not qualified, the driver assumes that the class resides in the com.datastax.oss.driver.internal.core.addresstranslation package.
-
PassThroughAddressTranslator(default): Use the driver’s built-in translator implementation that returns all addresses unchanged. -
Custom class: Specify a custom class that implements
AddressTranslatorand has a public constructor with aDriverContextargument.
--driver.advanced.connection.connect-timeout
Set the timeout to establish a channel connection to the server.
This timeout controls how long the driver waits for the underlying channel to actually connect to the server. This timeout doesn’t apply to protocol negotiations, which can continue after the channel is established.
Default: "30 seconds"
--driver.advanced.connection.init-query-timeout
Set the timeout for internal queries that run as part of the driver initialization process, immediately after opening a connection.
The timeout applies to each node connection. The connection initialization process fails if this timeout is reached.
If the driver’s first connection fails, the entire driver initialization fails. For subsequent connections that fail, the driver retries the connection later.
Default: "30 seconds"
--driver.advanced.connection.max-requests-per-connection
Set the maximum number of requests that can be executed concurrently on a connection (local or remote).
Must be a positive integer between 1 and 32768.
Default: 32768
--driver.advanced.connection.pool.local.size
Set the number of connections in the pool for nodes that are considered local.
Default: 8
--driver.advanced.connection.pool.remote.size
Set the number of connections in the pool for nodes that are considered remote.
This option has no effect when using the default DSBulk load balancing policy because this policy doesn’t consider remote nodes.
Default: 8
--driver.advanced.heartbeat.interval
Set the heartbeat interval to keep connections alive.
If a connection is idle (no reads) for the duration of the heartbeat interval, then the driver sends a simulated request (heartbeat) over the connection to keep it alive.
If there is no activity, no heartbeat, or the heartbeat expires, then the connection is closed and replaced.
Default: "1 minute"
--driver.advanced.heartbeat.timeout
Set how long the driver waits for a response to a heartbeat before considering the connection expired.
If the driver doesn’t receive a response within the timeout limit, then the connection is closed and replaced.
Default: "1 minute"
--driver.advanced.protocol.compression
The name of the algorithm used to compress Cassandra Native Protocol frames:
-
none(default) -
lz4 -
snappy
--driver.advanced.protocol.version
Set the Cassandra Native Protocol version to use:
-
Unset or
null(default): If this option isn’t set, the driver gets the node versions at startup (by default, it checkssystem.peers.release_version), and then it uses the highest common protocol version.For example, if you have a mixed cluster with Cassandra 2.1 nodes (protocol v3) and Cassandra 3.0 nodes (protocol v3 and v4), the driver chooses protocol v3.
If the nodes don’t have a common protocol version, initialization fails.
-
Protocol version: If this option is set to a specific protocol version, the driver uses the given version for all connections without any negotiation or downgrading.
If any of the contact points don’t support the specified protocol version, that contact point is skipped.
Once the protocol version is set, it cannot change for the duration of the driver’s session. If an incompatible node joins the cluster after initialization, the connection fails, and the driver doesn’t try to reconnect to the node.
--driver.advanced.resolve-contact-points
How to resolve the addresses passed to --driver.basic.contact-points:
-
true(default): Addresses are created withInetSocketAddress(String, int). The host name is resolved at the initial connection, and then the driver uses the resolved IP address for all subsequent connection attempts. -
false: Addresses are created withInetSocketAddress.createUnresolved(). The host name is resolved each time the driver opens a new connection.This is useful for containerized environments where DNS records are more likely to change over time.
Because the JVM and OS have their own DNS caching mechanisms, you might need additional configuration beyond the driver.
|
This option has no effect on dynamically discovered peers. The driver relies on Cassandra system tables that expose raw IP addresses.
To convert them to unresolved addresses, use a custom address translator in |
Authentication options
Use these options to provide cluster authentication credentials for the Java driver used by DSBulk.
--driver.advanced.auth-provider.class
Use this option to set the authentication provider class:
-
null(default): Disable authentication. Only use this value if the cluster doesn’t require authentication. -
PlainTextAuthProvider: This authentication provider implementation is included with the Java driver. If not qualified, the Java driver assumes that the class resides in either thecom.datastax.oss.driver.internal.core.authorcom.datastax.dse.driver.internal.core.authpackage.Use this class to authenticate with credentials set in
--driver.advanced.auth-provider.usernameand--driver.advanced.auth-provider.password.For DSE clusters only, proxy authentication is supported with
--driver.advanced.auth-provider.authorization-id. -
DseGssApiAuthProvider: This authentication provider implementation is included with the Java driver. If not qualified, the Java driver assumes that the class resides in either thecom.datastax.oss.driver.internal.core.authorcom.datastax.dse.driver.internal.core.authpackage.Use this class for GSSAPI authentication to DSE clusters secured with
DseAuthenticator. For more information, see the javadocs for this authenticator. -
Custom class: Specify a custom class that implements
AuthProviderand has a public constructor that takes aDriverContextargument. If not qualified, the Java driver assumes that the class resides in either thecom.datastax.oss.driver.internal.core.authorcom.datastax.dse.driver.internal.core.authpackage.To simplify customization, the Java driver provides two abstract classes that can be extended:
PlainTextAuthProviderBaseandDseGssApiAuthProviderBase.
--driver.advanced.auth-provider.username (-u)
This option is required if --driver.advanced.auth-provider.class is set to PlainTextAuthProvider.
For self-managed clusters with authentication enabled, provide a username for cluster authentication.
For Astra DB, use the literal string "token" as the username.
Strings that contain special characters must be quoted, and strings that contain double-quotes must be escaped. For more information, see Escape and quote command line arguments.
DataStax recommends specifying credentials in a configuration file instead of on the command line.
Default: null
--driver.advanced.auth-provider.password (-p)
This option is required if --driver.advanced.auth-provider.class is set to PlainTextAuthProvider.
For self-managed clusters with authentication enabled, provide a password for cluster authentication. For Astra DB, use an application token as the password.
Strings that contain special characters must be quoted, and strings that contain double-quotes must be escaped. For more information, see Escape and quote command line arguments.
DataStax recommends specifying credentials in a configuration file instead of on the command line.
Default: null
--driver.advanced.auth-provider.authorization-id
For DSE clusters, use this option for proxy authentication.
Set the authorization ID of the user to impersonate.
Default: null
SSL options
Use these options to enable SSL encrypted connections between self-managed clusters and the driver used by DSBulk. For more information, see Use SSL with DSBulk and Use SSL with Cassandra drivers.
|
For Astra DB connections, the |
--driver.advanced.ssl-engine-factory.class
|
This option is for self-managed clusters only. Don’t use this option when connecting to Astra DB. |
Use this option to set the SSL engine factory class:
-
null(default): Disable SSL encryption. -
DefaultSslEngineFactory: Enable SSL encryption with the driver-provided implementation that uses the JDK’s built-in SSL implementation. If not qualified, the driver assumes that the class resides in thecom.datastax.oss.driver.internal.core.sslpackage. -
Custom class: Enable SSL encryption with a custom class that implements
SslEngineFactoryand has a public constructor with aDriverContextargument. If not qualified, the driver assumes that the class resides in thecom.datastax.oss.driver.internal.core.sslpackage.
--driver.advanced.ssl-engine-factory.cipher-suites
|
This option is for self-managed clusters only. Don’t use this option when connecting to Astra DB. |
For the DefaultSslEngineFactory class only, provide a list of strings containing the cipher suites to enable when creating an SSLEngine for a connection.
For example, --driver.advanced.ssl-engine-factory.cipher-suites '["TLS_RSA_WITH_AES_128_CBC_SHA","TLS_RSA_WITH_AES_256_CBC_SHA"]'
If omitted, the driver won’t explicitly enable cipher suites on the engine.
For more information, see the javadocs for SSLEngine.setEnabledCipherSuites().
Default: null
--driver.advanced.ssl-engine-factory.hostname-validation
|
This option is for self-managed clusters only. Don’t use this option when connecting to Astra DB. |
For the DefaultSslEngineFactory class only, specify whether to validate that the server certificate’s hostname matches the server being connected to:
-
true(default): Enable hostname validation. -
false: Disable hostname validation.
--driver.advanced.ssl-engine-factory.keystore-path
|
This option is for self-managed clusters only. Don’t use this option when connecting to Astra DB. |
For the DefaultSslEngineFactory class only, set the locations used to access keystore contents.
If you set either --driver.advanced.ssl-engine-factory.keystore-path or --driver.advanced.ssl-engine-factory.truststore-path, then the driver builds an SSLContext from these files.
If neither of these options are specified, then the default SSLContext is used, which is based on system property configuration.
Default: null
--driver.advanced.ssl-engine-factory.keystore-password
|
This option is for self-managed clusters only. Don’t use this option when connecting to Astra DB. |
If you set --driver.advanced.ssl-engine-factory.keystore-path, then set the password to access the keystore.
Strings that contain special characters must be quoted, and strings that contain double-quotes must be escaped. For more information, see Escape and quote command line arguments.
DataStax recommends specifying credentials in a configuration file instead of on the command line.
Default: null
--driver.advanced.ssl-engine-factory.truststore-path
|
This option is for self-managed clusters only. Don’t use this option when connecting to Astra DB. |
For the DefaultSslEngineFactory class only, set the locations used to access truststore contents.
If you set either --driver.advanced.ssl-engine-factory.keystore-path or --driver.advanced.ssl-engine-factory.truststore-path, then the driver builds an SSLContext from these files.
If neither of these options are specified, then the default SSLContext is used, which is based on system property configuration.
Default: null
--driver.advanced.ssl-engine-factory.truststore-password
|
This option is for self-managed clusters only. Don’t use this option when connecting to Astra DB. |
If you set --driver.advanced.ssl-engine-factory.truststore-path, then set the password to access the truststore.
Strings that contain special characters must be quoted, and strings that contain double-quotes must be escaped. For more information, see Escape and quote command line arguments.
DataStax recommends specifying credentials in a configuration file instead of on the command line.
Default: null
Request options
Although all driver options inherently relate to query execution, these options are for specific aspects of query handling, such as consistency levels, idempotence, and retry policies.
--driver.basic.request.consistency (-cl)
Set the consistency level to use for all queries executed by DSBulk.
Stricter consistency levels reduce throughput because they take longer to process and verify. Typically, the strictest levels are used for write operations that must be written to multiple replicas before being acknowledged.
All levels other than LOCAL_ONE and ONE disable continuous paging, which can reduce throughput for read requests (dsbulk unload and dsbulk count).
To use continuous paging, the consistency level must be ONE or LOCAL_ONE.
The accepted values, from less strict (faster) to more strict (slower), are as follows:
-
ANY -
LOCAL_ONE -
ONE -
TWO -
THREE -
LOCAL_QUORUM -
QUORUM -
EACH_QUORUM -
ALL
The default value depends on the type of cluster:
-
For self-managed clusters, the default value is
LOCAL_ONE. -
For Astra DB, the default value is
LOCAL_QUORUMfordsbulk load, andLOCAL_ONEfordsbulk unloadanddsbulk count.
--driver.basic.request.default-idempotence
Set the default idempotence flag for all queries executed by DSBulk load operations:
-
true(default): All queries are considered idempotent by default. Write failures are retried for all queries according to the configured retry policy. -
false: All queries are considered non-idempotent by default. Write failures aren’t retried.
--driver.basic.request.page-size
Set the page size to control how many rows the driver retrieves simultaneously in a single network roundtrip.
This helps manage the amount of results stored in memory concurrently.
If there are more results, the driver sends additional requests to retrieve them.
This happens automatically when iterating with the sync API, or it happens explicitly with the async API’s fetchNextPage method.
If you encounter out-of-memory errors with continuous paging, consider lowering this value.
To disable results paging, set this option to a negative integer or 0.
Default: 5000
To configure page size for continuous paging, see Continuous paging options.
--driver.basic.request.serial-consistency
Set the serial consistency level:
-
LOCAL_SERIAL(default) -
SERIAL
--driver.basic.request.timeout
Set how long the driver waits for a request to complete.
This is a global limit on the duration of a session.execute() call from the driver, including any internal retries the driver handles.
Default: "5 minutes"
|
The default value of 5 minutes is intentional. Unlike a web application where requests must be processed as fast as possible, DSBulk is a data loading and unloading tool that prioritizes throughput over latency. If you lower this value, you might encounter timeouts due to the large batch writes and long-running read queries that DSBulk performs. |
--driver.advanced.retry-policy.class
Set the retry policy class:
-
MultipleRetryPolicy(default): The default retry policy used by DSBulk is"com.datastax.oss.dsbulk.workflow.commons.policies.retry.MultipleRetryPolicy". This is a special retry policy implementation that uses opinionated rules to retry most errors up to the limit set in--driver.advanced.retry-policy.max-retries. -
Custom class: Specify a custom class that implements
RetryPolicyand has a public constructor that takes two arguments: TheDriverContextand aStringrepresenting the profile name. If not qualified, the driver assumes that the class resides in thecom.datastax.oss.driver.internal.core.retrypackage.
--driver.advanced.retry-policy.max-retries (-maxRetries)
For the MultipleRetryPolicy class only, set the number of times to retry a failed query.
Default: 10
--driver.advanced.timestamp-generator.class
Set the microsecond timestamp generator class.
If it is not qualified, the driver assumes that the class resides in the com.datastax.oss.driver.internal.core.time package.
-
AtomicTimestampGenerator(default): This timestamp generator implementation is included with the Java driver. This class generates timestamps that are unique across all client threads. -
ThreadLocalTimestampGenerator: This timestamp generator implementation is included with the Java driver. This class generates timestamps that are unique within each thread only. The same timestamp can be generated by different threads. -
ServerSideTimestampGenerator: This timestamp generator implementation is included with the Java driver. This class lets the server assign timestamps rather than the client threads. -
Custom class: Specify a custom class that implements
TimestampGeneratorand has a public constructor that takes two arguments: TheDriverContextand aStringrepresenting the profile name.
Load balancing policy options
The Java driver’s load balancing policy controls the distribution of requests across a cluster. For a given query execution, the load balancing policy determines the node that coordinates the query execution and the nodes that can be used as failover hosts, if any.
Typically, the default load balancing policy configuration is sufficient.
|
When connecting to Astra DB, always use the default values for the load balancing policy options.
For Astra DB, the driver infers contact points, including the local datacenter, from the SCB ( |
--driver.basic.load-balancing-policy.class
Set the load balancing policy class.
If not qualified, the driver assumes that the class resides in the com.datastax.oss.driver.internal.core.loadbalancing package.
-
DcInferringLoadBalancingPolicy(default): The default load balancing policy used by DSBulk is"com.datastax.oss.driver.internal.core.loadbalancing.DcInferringLoadBalancingPolicy". This is a special policy that infers the local datacenter from the contact points. -
Custom class: Specify a custom class that implements
LoadBalancingPolicyand has a public constructor that takes two arguments: TheDriverContextand aStringrepresenting the profile name.
--driver.basic.load-balancing-policy.evaluator.class
Use this option if you want to provide an optional custom filter to include or exclude nodes from the load balancing policy. DSBulk has a default node evaluator implementation that you can use without specifying a custom class.
If you choose to use this option, you must provide the fully-qualified name of a class that implements java.util.function.Predicate<Node> and has a public constructor that takes two arguments: The DriverContext instance and a String representing the current execution profile name.
The predicate’s test(Node) method is invoked each time the policy processes a topology or state change.
If the method returns false, the node is set at distance IGNORED, which means the Java driver never connects to it, and the node is never included in query plans.
With the default DSBulk node evaluator implementation, you can further limit the available nodes with the --driver.basic.load-balancing-policy.evaluator.allow and --driver.basic.load-balancing-policy.evaluator.deny options.
Default: "com.datastax.oss.dsbulk.workflow.commons.policies.lbp.SimpleNodeDistanceEvaluator"
--driver.basic.load-balancing-policy.evaluator.allow (-allow)
|
This option is for self-managed clusters only. Don’t use this option when connecting to Astra DB. |
To use this option, you must use the default DSBulk node evaluator implementation (See --driver.basic.load-balancing-policy.evaluator.class).
An optional list of node host names or addresses that are allowed to connect.
Host names and addresses can use any of the formats allowed in --driver.basic.contact-points.
Default: []
--driver.basic.load-balancing-policy.evaluator.deny (-deny)
|
This option is for self-managed clusters only. Don’t use this option when connecting to Astra DB. |
To use this option, you must use the default DSBulk node evaluator implementation (See --driver.basic.load-balancing-policy.evaluator.class).
An optional list of node host names or addresses that aren’t allowed to connect.
Host names and addresses can use any of the formats allowed in --driver.basic.contact-points.
Default: []
--driver.basic.load-balancing-policy.local-datacenter (-dc)
Use this option to explicitly set the datacenter that is considered local for load balancing purposes.
The default load balancing policy for DSBulk (DcInferringLoadBalancingPolicy) infers the local datacenter from the contact points, and its query plans only include nodes from that datacenter.
When using the default policy, don’t set this option unless inference fails or you want to override the inferred value.
Default: null
Continuous paging options
Use these options to configure the Java driver’s continuous paging settings.
Continuous paging can improve processing of results returned by read requests (dsbulk unload and dsbulk count), particularly for large datasets.
This feature isn’t relevant to write requests (dsbulk load).
--executor.continuousPaging.enabled
|
|
Globally enable or disable continuous paging for read request results (dsbulk unload and dsbulk count):
-
true(default): Enable continuous paging for eligible read requests.To use continuous paging, the target cluster must support continuous paging, and the request consistency level (
--driver.basic.request.consistency) must be eitherONEorLOCAL_ONE. If either condition isn’t met, then the driver falls back to traditional paging.You can use the
--driver.advanced.continuous-paging.*options to tune the continuous paging behavior.Make sure
--engine.maxConcurrentQueriesdoesn’t exceed the cluster’s resources. If--engine.maxConcurrentQueriesis too high, requests can be rejected due to overloaded server resources. -
false: Disable continuous paging. The driver ignores all--driver.advanced.continuous-paging.*options, and it uses traditional paging for all read requests.
--driver.advanced.continuous-paging.max-enqueued-pages
To use this option, you must enable continuous paging with --dsbulk.executor.continuousPaging.enabled true.
Set the maximum number of pages that can be stored in the local queue.
This option must be set to a positive integer.
Default: 4
--driver.advanced.continuous-paging.max-pages
To use this option, you must enable continuous paging with --dsbulk.executor.continuousPaging.enabled true.
Set the maximum number of pages to return in total:
-
0(default): Retrieve all pages. -
Positive integer: Retrieve no more than this number of pages. All pages beyond this limit are ignored.
--driver.advanced.continuous-paging.max-pages-per-second
To use this option, you must enable continuous paging with --dsbulk.executor.continuousPaging.enabled true.
Set the maximum number of pages to retrieve per second:
-
0(default): Retrieve an unlimited number of pages per second. -
Positive integer: Retrieve no more than this number of pages per second.
--driver.advanced.continuous-paging.page-size
To use this option, you must enable continuous paging with --dsbulk.executor.continuousPaging.enabled true.
Set the page size for continuous paging as a number of rows or bytes.
The unit (rows or bytes) is set by --driver.advanced.continuous-paging.page-size-in-bytes.
For example, if you use the default values for both options, then the driver retrieves 5,000 rows at a time when using continuous paging.
Specifically, this is the number of rows or bytes that the driver retrieves simultaneously in a single network roundtrip.
These options help manage the amount of results stored in memory concurrently during continuous paging.
If there are more results, the driver sends additional requests to retrieve them.
This happens automatically when iterating with the sync API, or it happens explicitly with the async API’s fetchNextPage method.
If you encounter out-of-memory errors with continuous paging, consider lowering this value.
Default: 5000
--driver.advanced.continuous-paging.page-size-in-bytes
To use this option, you must enable continuous paging with --dsbulk.executor.continuousPaging.enabled true.
Set the unit, rows or bytes, for the --driver.advanced.continuous-paging.page-size option.
For example, if you use the default values for both options, then the driver retrieves 5,000 rows at a time.
If you want to driver to retrieve 5,000 bytes at a time instead, set this option to true.
-
false(default): Page size is interpreted as a number of rows. -
true: Page size is interpreted as a number of bytes.
|
The ideal page size and unit depends on your use case, including the available memory and the dataset’s characteristics.
Because DSBulk is designed for high throughput, the following examples assume that an
|
--driver.advanced.continuous-paging.timeout.first-page
To use this option, you must enable continuous paging with --dsbulk.executor.continuousPaging.enabled true.
Specify how long the driver should wait for the coordinator node to send the first page.
Default: "5 minutes"
--driver.advanced.continuous-paging.timeout.other-pages
To use this option, you must enable continuous paging with --dsbulk.executor.continuousPaging.enabled true.
Specify how long the driver should wait for the coordinator node to send subsequent pages (after the first page).
Default: "5 minutes"
Metrics options
Use these options to enable driver metrics collection by DSBulk. For more information, see Monitoring options.
--driver.advanced.metrics.node.enabled
If you enable JMX reporting, provide a list of node-level metrics to monitor:
-
bytes-received -
bytes-sent -
cql-messages -
errors.connection.auth -
errors.connection.init -
errors.request.aborted -
errors.request.others -
errors.request.read-timeouts -
errors.request.unavailables -
errors.request.unsent -
errors.request.write-timeouts -
ignores.aborted -
ignores.other -
ignores.read-timeout -
ignores.total -
ignores.unavailable -
ignores.write-timeout -
pool.in-flight -
pool.open-connections -
retries.aborted -
retries.other -
retries.read-timeout -
retries.total -
retries.unavailable -
retries.write-timeout
Default: [] (no node-level metrics monitored)
--driver.advanced.metrics.session.enabled
If you enable JMX reporting, provide a list of session-level metrics to monitor:
-
bytes-received -
bytes-sent -
connected-nodes -
cql-client-timeouts -
cql-requests
Default: [] (no session-level metrics monitored)
Deprecated driver options
- --driver.addressTranslator
-
Deprecated. Use
--driver.advanced.address-translator.class. - --driver.auth.authorizationId
-
Deprecated. Use
--driver.advanced.auth-provider.authorization-id. - --driver.auth.keyTab
-
Deprecated. Use
--driver.advanced.auth-provider.classand the related--driver.advanced.auth-provider.*settings. - --driver.auth.password (-p)
-
Deprecated. Use
--driver.advanced.auth-provider.password. - --driver.auth.principal
-
Deprecated. Use
--driver.advanced.auth-provider.classand the related--driver.advanced.auth-provider.*settings. - --driver.auth.provider
-
Deprecated. Use
--driver.advanced.auth-provider.class. - --driver.auth.saslService
-
Deprecated. Use
--driver.advanced.auth-provider.classand the related--driver.advanced.auth-provider.*settings. - --driver.auth.username (-u)
-
Deprecated. Use
--driver.advanced.auth-provider.username.
- --driver.basic.load-balancing-policy.filter.class
-
Deprecated. Use
--driver.basic.load-balancing-policy.evaluator.class.
- --driver.basic.load-balancing-policy.filter.allow (-allow)
-
Deprecated. Use
--driver.basic.load-balancing-policy.evaluator.allow.
- --driver.basic.load-balancing-policy.filter.deny
-
Deprecated. Use
--driver.basic.load-balancing-policy.evaluator.deny. - --driver.policy.lbp.dcAwareRoundRobin.allowRemoteDCsForLocalConsistencyLevel
-
Deprecated. See
--driver.basic.load-balancing-policy.class. - --driver.policy.lbp.dcAwareRoundRobin.localDc
-
Deprecated. Use
--driver.basic.load-balancing-policy.local-datacenter. - --driver.policy.lbp.dcAwareRoundRobin.usedHostsPerRemoteDc
-
Deprecated. See
--driver.basic.load-balancing-policy.class. - --driver.policy.lbp.dse.childPolicy
-
Deprecated. See
--driver.basic.load-balancing-policy.class. - --driver.policy.lbp.name (-lbp)
-
Deprecated. Use
--driver.basic.load-balancing-policy.class. - --driver.policy.lbp.tokenAware.childPolicy
-
Deprecated. See
--driver.basic.load-balancing-policy.class. - --driver.policy.lbp.tokenAware.shuffleReplicas
-
Deprecated. See
--driver.basic.load-balancing-policy.class. - --driver.policy.lbp.whiteList.childPolicy
-
Deprecated. See
--driver.basic.load-balancing-policy.class. - --driver.policy.lbp.whiteList.hosts
-
Deprecated. Use
--driver.basic.load-balancing-policy.evaluator.classand--driver.basic.load-balancing-policy.evaluator.allow. - --driver.policy.maxRetries (-maxRetries)
-
Deprecated. Use
--driver.advanced.retry-policy.max-retries. - --driver.pooling.heartbeat
-
Deprecated. Use
--driver.advanced.heartbeat.interval. - --driver.pooling.local.connections
-
Deprecated. Use
--driver.advanced.connection.pool.local.size. - --driver.pooling.local.requests
-
Deprecated. Use
--driver.advanced.connection.max-requests-per-connection. - --driver.pooling.remote.connections
-
Deprecated. Use
--driver.advanced.connection.pool.remote.size. - --driver.pooling.remote.requests
-
Deprecated. Use
--driver.advanced.connection.max-requests-per-connection. - --driver.protocol.compression
-
Deprecated. Use
--driver.advanced.protocol.compression. - --driver.query.consistency (-cl)
-
Deprecated. Use
--driver.basic.request.consistency. - --driver.query.fetchSize
-
Deprecated. Use
--driver.basic.request.page-size. - --driver.query.idempotence
-
Deprecated. Use
--driver.basic.request.default-idempotence. - --driver.query.serialConsistency
-
Deprecated. Use
--driver.basic.request.serial-consistency. - --driver.socket.readTimeout
-
Deprecated. Use
--driver.basic.request.timeout. - --driver.ssl.cipherSuites
-
Deprecated. Use
--driver.advanced.ssl-engine-factory.classand the relateddatastax-java-driver.advanced.ssl-engine-factory.*options. - --driver.ssl.keystore.algorithm
-
Deprecated. Use
--driver.advanced.ssl-engine-factory.classand the relateddatastax-java-driver.advanced.ssl-engine-factory.*options. - --driver.ssl.keystore.password
-
Deprecated. Use
--driver.advanced.ssl-engine-factory.keystore-password. - --driver.ssl.keystore.path
-
Deprecated. Use
--driver.advanced.ssl-engine-factory.keystore-path. - --driver.ssl.openssl.keyCertChain
-
Deprecated. Use
--driver.advanced.ssl-engine-factory.classand the relateddatastax-java-driver.advanced.ssl-engine-factory.*options. - --driver.ssl.openssl.privateKey
-
Deprecated. Use
--driver.advanced.ssl-engine-factory.classand the relateddatastax-java-driver.advanced.ssl-engine-factory.*options. - --driver.ssl.provider
-
Deprecated. Use
--driver.advanced.ssl-engine-factory.classand the relateddatastax-java-driver.advanced.ssl-engine-factory.*options. - --driver.ssl.truststore.algorithm
-
Deprecated. Use
--driver.advanced.ssl-engine-factory.classand the relateddatastax-java-driver.advanced.ssl-engine-factory.*options. - --driver.ssl.truststore.password
-
Deprecated. Use
--driver.advanced.ssl-engine-factory.truststore-password. - --driver.ssl.truststore.path
-
Deprecated. Use
--driver.advanced.ssl-engine-factory.truststore-path. - --driver.timestampGenerator
-
Deprecated. Use
--driver.advanced.timestamp-generator.class.
Deprecated executor options
The following dsbulk.executor options were used to configure driver concurrency and throughput in earlier versions of DSBulk.
These options are deprecated in favor of built-in datastax-java-driver options or dsbulk.engine options that provide equal or better functionality.
If your DSBulk configuration uses these options, replace them with their suggested alternatives.
Some deprecated options are still supported but disabled by default. Others no longer have any effect. If you use a non-functional deprecated option, DSBulk ignores it and logs a warning.
- --executor.continuousPaging.maxConcurrentQueries
-
Deprecated. Use
--dsbulk.engine.maxConcurrentQueries. - --executor.continuousPaging.maxPages
-
Deprecated. Use
--driver.advanced.continuous-paging.max-pages. - --executor.continuousPaging.maxPagesPerSecond
-
Deprecated. Use
--driver.advanced.continuous-paging.max-pages-per-second. - --executor.continuousPaging.pageSize
-
Deprecated. Use
--driver.advanced.continuous-paging.page-size. - --executor.continuousPaging.pageUnit
-
Deprecated. Use
--driver.advanced.continuous-paging.page-size-in-bytes.
- --executor.maxInFlight
-
--executor.maxInFlightcreates a semaphore that blocks the driver under high contention. Therefore, DataStax recommends that you use--dsbulk.engine.maxConcurrentQueries, which achieves the same effect without blocking the driver.About
--executor.maxInFlightThis option sets a soft global throughput limit on the maximum number of concurrent operations in flight, which are concurrent requests waiting for a response from the server:
-
Positive number: Set a soft limit on the maximum number of operations in flight. This acts as a safeguard to prevent more requests than the cluster can handle. Make sure this value is within the bounds of the cluster’s throughput capacity. If this value is too high, the cluster can get overloaded, leading to out-of-memory errors, increased latency, and timeouts.
This is considered a soft limit due to the way pending requests are counted. For
loadoperations, a batch is considered one request, which means that the actual number of pending requests can exceed this value if you are using batching. Forunloadandcountoperations, each request for the next page of results is considered one request.To set a fixed throughput limit, use
--dsbulk.engine.maxConcurrentQueries(recommended) or--dsbulk.executor.maxPerSecond. -
Zero or negative number (default): Disables this option. Throughput can still be limited through other options, such as
--dsbulk.engine.maxConcurrentQueries.
Astra DB rate limits are always enforced on the Astra DB server side, regardless of the DSBulk configuration. If your
dsbulkcommands connect to an Astra DB database, make sure this limit doesn’t exceed the Astra DB rate limit. -
- --executor.maxBytesPerSecond
-
--executor.maxBytesPerSecondcreates a semaphore that blocks the driver under high contention. Therefore, DataStax recommends that you use--dsbulk.engine.maxConcurrentQueries, which achieves a similar effect without blocking the driver.About
--executor.maxBytesPerSecondThis option sets a fixed global throughput limit on the maximum number of bytes processed per second:
-
Enabled: Set a fixed maximum number of bytes per second as a valid long integer or in HOCON size-in-bytes format. For example,
1234,1K, or5 kibibytes.This acts as a safeguard to prevent more requests than the cluster can handle. Make sure this value is within the bounds of the cluster’s throughput capacity. If this value is too high, the cluster can get overloaded, leading to out-of-memory errors, increased latency, and timeouts.
This setting applies to all operations. For
loadoperations, this includes bytes written to the database, and forunloadandcountoperations, this includes bytes read from the database.To set a soft or variable throughput limit, use
--dsbulk.engine.maxConcurrentQueries(recommended) or--dsbulk.executor.maxInFlight. -
Disabled (default): Set to
0or a negative number to disable this option. Throughput can still be limited through other options, such as--dsbulk.engine.maxConcurrentQueries.
Astra DB rate limits are always enforced on the Astra DB server side, regardless of the DSBulk configuration. If your
dsbulkcommands connect to an Astra DB database, make sure this limit doesn’t exceed the Astra DB rate limit. -
- --executor.maxPerSecond
-
--executor.maxPerSecondcreates a semaphore that blocks the driver under high contention. Therefore, DataStax recommends that you use--dsbulk.engine.maxConcurrentQueries, which achieves the same effect without blocking the driver.About
--executor.maxPerSecondThis option sets a fixed global throughput limit on the maximum number of concurrent operations per second:
-
Positive number: Set a fixed maximum number of operations per second. This acts as a safeguard to prevent more requests than the cluster can handle. Make sure this value is within the bounds of the cluster’s throughput capacity. If this value is too high, the cluster can get overloaded, leading to increased latency and timeouts.
This setting applies to all operations. For
loadoperations, it limits the maximum number of writes per second, which includes nested statements inside a batch. Forunloadandcountoperations, it limits the number of rows retrieved per second.To set a soft or variable throughput limit, use
--dsbulk.engine.maxConcurrentQueries(recommended) or--dsbulk.executor.maxInFlight. -
Zero or negative number (default): Disables this option. Throughput can still be limited through other options, such as
--dsbulk.engine.maxConcurrentQueries.
Astra DB rate limits are always enforced on the Astra DB server side, regardless of the DSBulk configuration. If your
dsbulkcommands connect to an Astra DB database, make sure this limit doesn’t exceed the Astra DB rate limit. -