Driver options

This topic describes a commonly-used subset of DataStax Java driver options that you can specify with the dsbulk command. Many additional options exist. Be sure to read the DataStax Java driver configuration reference documentation. Also refer to the driver matrix.

The options can be used in short form (-h host_name) or long form (--datastax-java-driver.basic.contact-point host_name).

DataStax Java driver configuration settings start with the prefix datastax-java-driver. On the dsbulk command line, you can abbreviate this prefix to driver, if you prefer.

General options

Specify general options for using dsbulk with the DataStax Java driver. Use these options to define the contact points and port number for the initial connection. Additionally, define policy options pertaining to the DataStax Java driver load balancing policy settings, pooling options, query options, and socket connections.

  • -h, --driver.basic.contact-points, --datastax-java-driver.basic.contact-points host_name(s)

    The contact points to use for the initial connection to the cluster.

    These are addresses of Cassandra nodes that the driver uses to discover the cluster topology. Only one contact point is required (the driver retrieves the address of the other nodes automatically), but it is usually a good idea to provide more than one contact point. If a single contact point is unavailable, the driver cannot initialize itself correctly.

    This must be a list of strings with each contact point specified as host or host:port. If the host is specified without a port, the default port specified in basic.default-port is used. Apache Cassandra 3.0 and earlier and DataStax Enterprise (DSE) 6.7 and earlier require all nodes in a cluster to share the same port.

    Valid examples of contact points are:

    • IPv4 addresses with ports: [ "192.168.0.1:9042", "192.168.0.2:9042" ]

    • IPv4 addresses without ports: [ "192.168.0.1", "192.168.0.2" ]

    • IPv6 addresses with ports: [ "fe80:0:0:0:f861:3eff:fe1d:9d7b:9042", "fe80:0:0:f861:3eff:fe1d:9d7b:9044:9042" ]

    • IPv6 addresses without ports: [ "fe80:0:0:0:f861:3eff:fe1d:9d7b", "fe80:0:0:f861:3eff:fe1d:9d7b:9044" ]

    • Host names with ports: [ "host1.com:9042", "host2.com:9042" ]

    • Host names without ports: [ "host1.com", "host2.com:" ] If the host is a DNS name that resolves to multiple A-records, all the corresponding addresses is used. Do not use localhost as a host-name (because it resolves to both IPv4 and IPv6 addresses on some platforms). The port for all hosts must be specified with driver.port.

      Be sure to enclose address strings that contain special characters in quotes, as shown in the following examples:

      dsbulk unload -h '["fe80::f861:3eff:fe1d:9d7a"]' -query "SELECT * from foo.bar;"
      dsbulk unload -h '["fe80::f861:3eff:fe1d:9d7b","fe80::f861:3eff:fe1d:9d7c"]'
                    -query "SELECT * from foo1.bar1;"

      The heuristic to determine whether a contact point is in the form "host" or "host:port" is not 100% accurate for some IPv6 addresses; avoid ambiguous IPv6 addresses such as fe80::f861:3eff:fe1d:1234 because such a string could be interpreted as a combination of IP fe80::f861:3eff:fe1d with port 1234, or as IP fe80::f861:3eff:fe1d:1234 without port. In such cases, DataStax Bulk Loader for Apache Cassandra does not change the contact point. To avoid this issue, provide IPv6 addresses in full form. For example, instead of fe80::f861:3eff:fe1d:1234, provide fe80:0:0:0:0:f861:3eff:fe1d:1234, so that the string is parsed as IP fe80:0:0:0:0:f861:3eff:fe1d with port 1234.

      On cloud deployments, DataStax Bulk Loader for Apache Cassandra automatically sets this option to an empty list, because contact points are not allowed to be explicitly provided when connecting to DataStax Astra databases.

      Default: 127.0.0.1

  • -port, --driver.basic.default-port, --datastax-java-driver.basic.default-port port_number

    The port to use for basic.contact-points, when a host is specified without a port. All nodes in a cluster must accept connections on the same port number.

    Default: 9042

  • -b, --driver.basic.cloud.secure-connect-bundle, --datastax-java-driver.basic.cloud.secure-connect-bundle string

    The location of the secure bundle used to connect to a cloud-based DataStax Astra database. This setting must be a path on the local filesystem or a valid URL. Examples:

    "/path/to/bundle.zip"        # path on Linux or macOS
    "./path/to/bundle.zip"       # path on Linux or macOS relative to working directory
    "~/path/to/bundle.zip"       # path on Linux or macOS relative to home directory
    "C:\\path\\to\\bundle.zip"   # path on Windows; escape backslashes
    "file:/a/path/to/bundle.zip" # URL with file protocol
    "http://host.com/bundle.zip" # URL with HTTP protocol

    DataStax Astra Open Beta participants can download the f/astra/aws/doc/dscloud/astra/dscloudObtainingCredentials.html[secure connect bundle] from the DataStax Cloud console after creating an Astra database. The secure-connect-bundle option is only for Astra databases. Do not use the following options when connecting to cloud-based Astra deployments:

    • datastax-java-driver.basic.contact-points

    • datastax-java-driver.basic.request.consistency

    • datastax-java-driver.advanced.ssl-engine-factory.*

    Default: null

  • -cl,--driver.basic.request.consistency, --datastax-java-driver.basic.request.consistency string

    The consistency level to use for all queries. Stronger consistency levels usually result in reduced throughput. In addition, any level higher than ONE automatically disables continuous paging, which can dramatically reduce read throughput.

    Valid values are: ANY, LOCAL_ONE, ONE, TWO, THREE, LOCAL_QUORUM, QUORUM, EACH_QUORUM, ALL.

    On cloud deployments, the only accepted consistency level when writing is LOCAL_QUORUM. Therefore, the default value is LOCAL_ONE, except when loading in cloud deployments, in which case the default is automatically changed to LOCAL_QUORUM.

    Default: LOCAL_ONE

  • --driver.basic.request.timeout, --datastax-java-driver.basic.request.timeout "string"

    How long the DataStax Java driver waits for a request to complete. This is a global limit on the duration of a session.execute() call, including any internal retries the driver might do. By default, this value is set very high because DataStax Bulk Loader is optimized for good throughput, rather than good latencies.

    Default: "5 minutes"

  • --driver.basic.request.default-idempotence, --datastax-java-driver.basic.request.default-idempotence {true | false}

    The default idempotence for all queries executed in DataStax Bulk Loader. Setting this option to false causes all unload failures to not be retried.

    Default: true

  • --driver.basic.request.serial-consistency, --datastax-java-driver.basic.request.serial-consistency string

    The serial consistency level to use during unload operations. Possible options are LOCAL_SERIAL or SERIAL.

    Default: LOCAL_SERIAL

  • --driver.basic.request.page-size, --datastax-java-driver.basic.request.page-size number

    The page size. This controls how many rows are retrieved simultaneously in a single network roundtrip (the goal being to avoid loading too many results in memory at the same time). If there are more results, additional requests are used to retrieve them (either automatically if you iterate with the sync API, or explicitly with the async API’s fetchNextPage method). If the value is 0 or negative, it is ignored and the request is not be paged.

    Default: 5000

  • --driver.basic.load-balancing-policy.class, --datastax-java-driver.basic.load-balancing-policy.class string

    The load balancing policy class to use. If not qualified, the DataStax Java driver assumes that it resides in the package com.datastax.oss.driver.internal.core.loadbalancing. DataStax Bulk Loader uses a special policy that infers the local datacenter from the contact points. You can also specify a custom class that implements LoadBalancingPolicy and has a public constructor with two arguments: the DriverContext and a String representing the profile name.

    Default: "com.datastax.oss.driver.internal.core.loadbalancing.DcInferringLoadBalancingPolicy"

  • --driver.basic.load-balancing-policy.filter.class, --datastax-java-driver.basic.load-balancing-policy.filter.class string

    An optional custom filter to include or exclude nodes. If present, the option must be the fully-qualified name of a class that implements java.util.function.Predicate<Node>, and has a public constructor taking a single DriverContext argument. The predicate’s test(Node) method is invoked each time the policy processes a topology or state change. If the method returns false, the node is set at distance IGNORED, which means the Java driver does not ever connect to it, and the node is never included in any query plan.

    By default, DataStax Bulk Loader for Apache Cassandra provides a node filter implementation that honors the following settings:

    • datastax-java-driver.basic.load-balancing-policy.filter.allow: a list of host names or host addresses that should be allowed.

    • datastax-java-driver.basic.load-balancing-policy.filter.deny: a list of host names or host addresses that should be denied. For details, see load-balancing-policy.filter.allow and load-balancing-policy.filter.deny.

    Default: "com.datastax.oss.dsbulk.workflow.commons.policies.lbp.SimpleNodeFilter"

  • -allow, --driver.basic.load-balancing-policy.filter.allow, --datastax-java-driver.basic.load-balancing-policy.filter.allow <list<string>>

    An optional list of host names or host addresses that should be allowed to connect. See <←-driver.basic.contact-points>> for a full description of accepted formats. This option only has effect when the setting datastax-java-driver.basic.load-balancing-policy.filter.class refers to the DataStax Bulk Loader default node filter implementation: com.datastax.oss.dsbulk.workflow.commons.policies.lbp.SimpleNodeFilter.

    Not compatible with DataStax Astra databases.

    Default: []

  • -deny, --driver.basic.load-balancing-policy.filter.deny, --datastax-java-driver.basic.load-balancing-policy.filter.deny <list<string>>

    An optional list of host names or host addresses that should be denied the ability to connect. See --driver.basic.contact-points for a full description of accepted formats. This option only has effect when the setting datastax-java-driver.basic.load-balancing-policy.filter.class refers to the DataStax Bulk Loader default node filter implementation: com.datastax.oss.dsbulk.workflow.commons.policies.lbp.SimpleNodeFilter.

    Not compatible with DataStax Astra databases.

    Default: []

  • -dc, --driver.basic.load-balancing-policy.local-datacenter, --datastax-java-driver.basic.load-balancing-policy.local-datacenter string

    The datacenter that is considered local. The default load balancing policy only includes nodes from this datacenter in its query plans. Set this to a value if you want to declare the local datacenter; otherwise, the DcInferringLoadBalancingPolicy that DataStax Bulk Loader uses by default infers the local datacenter from the provided contact points.

    Default: unspecified

  • --driver.advanced.retry-policy.max-retries, --datastax-java-driver.advanced.retry-policy.max-retries number

    How many times to retry a failed query. Only valid for use with the DataStax Bulk Loader default retry policy (MultipleRetryPolicy).

    Default: 10

Authorization options

Specify authorization options for using dsbulk with the DataStax Java driver.

  • --driver.advanced.auth-provider.class, --datastax-java-driver.advanced.auth-provider.class arg

    The class of the authentication provider. If it is not qualified, the Java driver assumes that it resides in one of the following packages:

    • com.datastax.oss.driver.internal.core.auth

    • com.datastax.dse.driver.internal.core.auth The DSE driver provides implementations out of the box:

    • PlainTextAuthProvider: uses plain-text credentials. It requires the username and password options. Should be used only when authenticating against Apache Cassandra┬« clusters; not recommended when authenticating against DSE clusters.

    • DsePlainTextAuthProvider: provides SASL authentication using the PLAIN mechanism for DSE clusters secured with DseAuthenticator. It requires the username and password options, and optionally, an authorization-id. You can also specify a custom class that implements AuthProvider and has a public constructor with a DriverContext argument; to simplify this step, the Java driver provides two abstract classes that can be extended: DsePlainTextAuthProviderBase and DseGssApiAuthProviderBase.

    Default: null

  • -u,--driver.advanced.auth-provider.username, --datastax-java-driver.advanced.auth-provider.username string

    The username to use. Providers that accept this setting:

    Default: null

  • -p,--driver.advanced.auth-provider.password, --datastax-java-driver.advanced.auth-provider.password string

    The password to use. Providers that accept this setting:

    Default: null

  • --driver.advanced.auth-provider.authorization-id, --datastax-java-driver.advanced.auth-provider.authorization-id string

    An authorization ID allows the currently authenticated user to act as a different user (proxy authentication). Providers that accept this setting:

    • DsePlainTextAuthProvider

    • DseGssApiAuthProvider Default: null

SSL options

Specify SSL encryption options for using dsbulk with the DataStax Java driver. For additional information on SSL, see the Oracle Java Guide on SSL.

  • --driver.advanced.ssl-engine-factory.class, --datastax-java-driver.advanced.ssl-engine-factory.class string

    The class of the SSL engine factory. If not qualified, the DataStax Java driver assumes that it resides in the package com.datastax.oss.driver.internal.core.ssl. The DataStax Java driver provides a single implementation DefaultSslEngineFactory, which uses the JDK’s built-in SSL implementation.

    You can also specify a custom class that implements SslEngineFactory and has a public constructor with a DriverContext argument.

    Default: null

  • --driver.advanced.ssl-engine-factory.hostname-validation, --datastax-java-driver.advanced.ssl-engine-factory.hostname-validation boolean

    Whether to require validation that the hostname of the server certificate’s common name matches the hostname of the server being connected to. This setting is only required when using the default SSL factory. If not set, defaults to true.

    Default: true

  • --driver.advanced.ssl-engine-factory.truststore-path, --datastax-java-driver.advanced.ssl-engine-factory.truststore-path string

    The locations used to access truststore contents. If either truststore-path or keystore-path are specified, the DataStax Java driver builds an SSLContext from these files. This setting is only required when using the default SSL factory. If neither option is specified, the default SSLContext is used, which is based on system property configuration.

    Default: null

  • --driver.advanced.ssl-engine-factory.truststore-password, --datastax-java-driver.advanced.ssl-engine-factory.truststore-password string

    The password used to access truststore contents. This setting is only required when using the default SSL factory.

    Default: null

  • --driver.advanced.ssl-engine-factory.keystore-path, --datastax-java-driver.advanced.ssl-engine-factory.keystore-path string

    The locations used to access keystore contents. If either truststore-path or keystore-path are specified, the DataStax Java driver builds an SSLContext from these files. This setting is only required when using the default SSL factory. If neither option is specified, the default SSLContext is used, which is based on system property configuration.

    Default: null

  • --driver.advanced.ssl-engine-factory.keystore-password, --datastax-java-driver.advanced.ssl-engine-factory.keystore-password string

    The password used to access keystore contents. This setting is only required when using the default SSL factory.

    Default: null

Continuous paging options

Continuous paging options only take effect if continuous paging is globally enabled, which can be done with the executor option dsbulk.executor.continuousPaging.enabled.

  • --driver.advanced.continuous-paging.page-size, --datastax-java-driver.advanced.continuous-paging.page-size number

    Set the page size.The value can be interpreted in number of rows or in number of bytes, depending on the <←-driver.advanced.continuous-paging.page-size-in-bytes>> boolean value. This page size option controls how many rows (or how much data) is retrieved simultaneously in a single network roundtrip. The goal is to avoid loading too many results in memory at the same time. If there are more results, additional requests are used to retrieve them automatically (if you iterate with the sync API), or explicitly with the async API’s fetchNextPage method. The default is the same as the driver’s normal request page size: 5000 (rows).

    Default: 5000

  • --driver.advanced.continuous-paging.page-size-in-bytes, --datastax-java-driver.advanced.continuous-paging.page-size-in-bytes {true | false}

    Whether the page-size option should be interpreted in number of rows or bytes. The default of false means page size is interpreted as the number of rows.

    Default: false

  • --driver.advanced.continuous-paging.max-pages, --datastax-java-driver.advanced.continuous-paging.max-pages number

    The maximum number of pages to return. The default of zero means retrieve all pages.

    Default: 0

  • --driver.advanced.continuous-paging.max-pages-per-second, --datastax-java-driver.advanced.continuous-paging.max-pages-per-second number

    Sets the maximum number of pages per second. The default of zero means no limit.

    Default: 0

  • --driver.advanced.continuous-paging.max-enqueued-pages, --datastax-java-driver.advanced.continuous-paging.max-enqueued-pages number

    The maximum number of pages that can be stored in the local queue. This value must be positive.

    Default: 4

  • --driver.advanced.continuous-paging.timeout.first-page, --datastax-java-driver.advanced.continuous-paging.timeout.first-page "string"

    How long to wait for the DataStax Bulk Loader coordinator to the first page.

    Default: "5 minutes"

  • --driver.advanced.continuous-paging.timeout.other-pages, --datastax-java-driver.advanced.continuous-paging.timeout.other-pages "string"

    How long to wait for the DataStax Bulk Loader coordinator to send subsequent pages.

    Default: "5 minutes"

Advanced options

Specify advanced options for using dsbulk with the DataStax Java driver.

  • --driver.advanced.protocol.version, --datastax-java-driver.advanced.protocol.version string

    The native protocol version to use. If not set, the DataStax Java driver looks up the versions of the nodes at startup (by default, system.peers.release_version) and chooses the highest common protocol version.

    For example, if you have a mixed cluster with Apache Cassandra 2.1 nodes (protocol v3) and Apache Cassandra 3.0 nodes (protocol v3 and v4), the driver chooses protocol v3. If the nodes do not have a common protocol version, initialization fails. If this option is set, the given version is used for all connections without any negotiation or downgrading. If any of the contact points do not support the protocol version, that contact point is skipped. Once the protocol version is set, it cannot change for the duration of the driver’s session. If an incompatible node joins the cluster later, the connection fails and the driver does not try to reconnect to the node.

    Default: null

  • --driver.advanced.protocol.compression, --datastax-java-driver.advanced.protocol.compression string

    The name of the algorithm used to compress protocol frames. Possible values are: lz4, snappy or none.

    Default: none

  • --driver.advanced.connection.pool.local.size, --datastax-java-driver.advanced.connection.pool.local.size number

    The number of connections in the pool for nodes considered as local.

    Default: 8

  • --driver.advanced.connection.pool.remote.size, --datastax-java-driver.advanced.connection.pool.remote.size number

    The number of connections in the pool for nodes considered as remote. The default load balancing policy used by DataStax Bulk Loader does not consider remote nodes. As a result, this setting has no effect when using the default load balancing policy.

    Default: 8

  • --driver.advanced.connection.max-requests-per-connection, --datastax-java-driver.advanced.connection.max-requests-per-connection number

    The maximum number of requests that can be executed concurrently on a connection. Applies to local or remote connections. Must be a number between 1 and 32768.

    Default: 32768

  • --driver.advanced.resolve-contact-points, --datastax-java-driver.advanced.resolve-contact-points {true | false}

    Whether to resolve the addresses passed to basic.contact-points.

    • If true, addresses are created with InetSocketAddress(String, int). The host name is resolved the first time, and the driver uses the resolved IP address for all subsequent connection attempts.

    • If false, addresses are created with InetSocketAddress.createUnresolved(). the host name is resolved again every time the driver opens a new connection. This is useful for containerized environments where DNS records are more likely to change over time.

    JVM and OS have their own DNS caching mechanisms, so you might need additional configuration beyond the driver.

    This option only applies to the contact points specified in the configuration. It has no effect on dynamically discovered peers. The driver relies on Cassandra system tables, which expose raw IP addresses. Use a custom address translator (see --driver.advanced.address-translator.class) to convert them to unresolved addresses; if you’re in a containerized environment, you probably already need address translation.

    Default: true

  • --driver.advanced.address-translator.class, --datastax-java-driver.advanced.address-translator.class "string"

    The class of the microsecond timestamp generator. If it is not qualified, the driver assumes that it resides in the package com.datastax.oss.driver.internal.core.time. The driver provides the following implementations out of the box:

    • AtomicTimestampGenerator: timestamps are guaranteed to be unique across all client threads.

    • ThreadLocalTimestampGenerator: timestamps that are guaranteed to be unique within each thread only.

    • ServerSideTimestampGenerator: do not generate timestamps, let the server assign them. You can also specify a custom class that implements TimestampGenerator and has a public constructor with two arguments: the DriverContext and a String representing the profile name.

    Default: "AtomicTimestampGenerator"

  • --driver.advanced.timestamp-generator.class, --datastax-java-driver.advanced.timestamp-generator.class "string"

    The class of the translator. If not qualified, the DataStax Java driver assumes that it resides in the package com.datastax.oss.driver.internal.core.addresstranslation. The DataStax Java driver provides the PassThroughAddressTranslator implementation, which returns all addresses unchanged. You can also specify a custom class that implements AddressTranslator and has a public constructor with a DriverContext argument.

    Default: "PassThroughAddressTranslator"

  • --driver.advanced.heartbeat.interval, --datastax-java-driver.advanced.heartbeat.interval "string"

    The heartbeat interval. If a connection stays idle for that duration (there are no reads), the DataStax Java driver sends a dummy message on it to make sure it’s still alive. If not, the connection is closed and replaced.

    Default: "30 seconds"

  • --driver.advanced.heartbeat.timeout, --datastax-java-driver.advanced.heartbeat.timeout "string"

    How long the DataStax Java driver waits for the response to a heartbeat. If this timeout occurs, the heartbeat is considered failed.

    Default: "60 seconds"

Deprecated options