# Driver options

DataStax Java driver options for the dsbulk command

This topic describes a commonly-used subset of DataStax Java driver options that you can specify with the dsbulk command. Many additional options exist. Be sure to read the DataStax Java driver configuration reference documentation. Also refer to the driver matrix.

The options can be used in short form (-h host_name) or long form (--driver.basic.contact-point host_name).

Tip: DataStax Java driver configuration settings start with the prefix datastax-java-driver. On the dsbulk command line, you can abbreviate this prefix to driver, as shown in this topic.

## Basic options

Specify basic options for using dsbulk with the DataStax Java driver. Use these options to define the contact points and port number for the initial connection. Additionally, define policy options pertaining to the DataStax Java driver load balancing policy settings, pooling options, query options, and socket connections.

-h, --driver.basic.contact-points, --datastax-java-driver.basic.contact-points host_name(s)

The contact points to use for the initial connection to the cluster. This must be a list of strings with each contact point specified as host or host:port. If the host is specified without a port, the default port specified in basic.default-port will be used. Apache Cassandra 3.0 and earlier and DataStax Enterprise (DSE) 6.7 and earlier require all nodes in a cluster to share the same port.

If the host is a DNS name that resolves to multiple A-records, all the corresponding addresses will be used. Do not use localhost as a host-name (because it resolves to both IPv4 and IPv6 addresses on some platforms). The port for all hosts must be specified with driver.port.
Note: Be sure to enclose address strings that contain special characters in quotes, as shown in these examples:
dsbulk unload -h '["fe80::f861:3eff:fe1d:9d7a"]' -query "SELECT * from foo.bar;"
dsbulk unload -h '["fe80::f861:3eff:fe1d:9d7b","fe80::f861:3eff:fe1d:9d7c"]'
-query "SELECT * from foo1.bar1;"

Default: 127.0.0.1

-port, --driver.basic.default-port, --datastax-java-driver.basic.default-port port_number

The port to use for basic.contact-points, when a host is specified without a port. All nodes in a cluster must accept connections on the same port number.

Default: 9042

-b, --driver.basic.cloud.secure-connect-bundle, --datastax-java-driver.basic.cloud.secure-connect-bundle string
The location of the secure bundle used to connect to a cloud-based DataStax Apollo database. This setting must be a path on the local filesystem or a valid URL. Examples:
"/path/to/bundle.zip"          # path on unix
"./path/to/bundle.zip"         # path on unix relative to working directory
"~/path/to/bundle.zip"         # path on unix relative to home directory
"C:\\path\\to\\bundle.zip"     # path on Windows; escape backslashes
"file:/a/path/to/bundle.zip"   # URL with file protocol
"http://host.com/bundle.zip"   # URL with HTTP protocol
Note: Apollo open beta participants can download the secure connect bundle from the DataStax Constellation console after creating an Apollo database. The secure-connect-bundle option is only for Apollo databases. Do not use the following options when connecting to cloud-based Apollo deployments:
• datastax-java-driver.basic.contact-points
• datastax-java-driver.basic.request.consistency
• datastax-java-driver.advanced.ssl-engine-factory.*

Default: null

--driver.basic.request.timeout, --datastax-java-driver.basic.request.timeout "string"

How long the DataStax Java driver waits for a request to complete. This is a global limit on the duration of a session.execute() call, including any internal retries the driver might do. By default, this value is set very high because DataStax Bulk Loader is optimized for good throughput, rather than good latencies.

Default: "60 seconds"

The load balancing policy class to use. If not qualified, the DataStax Java driver assumes that it resides in the package com.datastax.oss.driver.internal.core.loadbalancing. DataStax Bulk Loader uses a special policy that infers the local datacenter from the contact points. You can also specify a custom class that implements LoadBalancingPolicy and has a public constructor with two arguments: the DriverContext and a String representing the profile name.

Default: "com.datastax.dse.driver.internal.core.loadbalancing.DseDcInferringLoadBalancingPolicy"

The datacenter that is considered local. The default load balancing policy only includes nodes from this datacenter in its query plans. Set this to a value if you want to declare the local datacenter; otherwise, the DseDcInferringLoadBalancingPolicy that DataStax Bulk Loader uses by default infers the local datacenter from the provided contact points.

Default: unspecified

--driver.basic.request.default-idempotence, --datastax-java-driver.basic.request.default-idempotence {true | false}

The default idempotence for all queries executed in DataStax Bulk Loader. Setting this option to false causes all unload failures to not be retried.

Default: true

--driver.basic.request.serial-consistency, --datastax-java-driver.basic.request.serial-consistency string

The serial consistency level to use during unload operations. Possible options are LOCAL_SERIAL or SERIAL.

Default: LOCAL_SERIAL

## Authorization options

Specify authorization options for using dsbulk with the DataStax Java driver. For additional information on SSL, see the Oracle Java Guide on SSL.

The username to use. Providers that accept this setting:
• PlainTextAuthProvider
• DsePlainTextAuthProvider
Important: DataStax recommends specifying username and password credentials in a configuration file, instead of on the command line. For an example, refer to Creating a configuration file for dsbulk.

Default: null

The password to use. Providers that accept this setting:
• PlainTextAuthProvider
• DsePlainTextAuthProvider
Important: DataStax recommends specifying username and password credentials in a configuration file, instead of on the command line. For an example, refer to Creating a configuration file for dsbulk.

Default: null

## SSL options

Specify SSL encryption options for using dsbulk with the DataStax Java driver. For additional information on SSL, see the Oracle Java Guide on SSL.

The class of the SSL engine factory. If not qualified, the DataStax Java driver assumes that it resides in the package com.datastax.oss.driver.internal.core.ssl. The DataStax Java driver provides a single implementation DefaultSslEngineFactory, which uses the JDK's built-in SSL implementation.

You can also specify a custom class that implements SslEngineFactory and has a public constructor with a DriverContext argument.

Default: null

Whether to require validation that the hostname of the server certificate's common name matches the hostname of the server being connected to. This setting is only required when using the default SSL factory. If not set, defaults to true.

Default: true

The locations used to access truststore contents. If either truststore-path or keystore-path are specified, the DataStax Java driver builds an SSLContext from these files. This setting is only required when using the default SSL factory. If neither option is specified, the default SSLContext is used, which is based on system property configuration.

Default: null

The password used to access truststore contents. This setting is only required when using the default SSL factory.

Default: null

The locations used to access keystore contents. If either truststore-path or keystore-path are specified, the DataStax Java driver builds an SSLContext from these files. This setting is only required when using the default SSL factory. If neither option is specified, the default SSLContext is used, which is based on system property configuration.

Default: null

The password used to access keystore contents. This setting is only required when using the default SSL factory.

Default: null

How many times to retry a failed query. Only valid for use with the DataStax Bulk Loader default retry policy (MultipleRetryPolicy).

Default: 10

## Continuous paging options

Set the page size.The value can be interpreted in number of rows or in number of bytes, depending on the page-size-in-bytes boolean value. This page size option controls how many rows (or how much data) is retrieved simultaneously in a single network roundtrip. The goal is to avoid loading too many results in memory at the same time. If there are more results, additional requests are used to retrieve them automatically (if you iterate with the sync API), or explicitly with the async API's fetchNextPage method. The default is the same as the driver's normal request page size: 5000 (rows).

Default: 5000

Whether the page-size option should be interpreted in number of rows or bytes. The default of false means page size is interpreted as the number of rows.

Default: false

The maximum number of pages to return. The default of zero means retrieve all pages.

Default: 0

Sets the maximum number of pages per second. The default of zero means no limit.

Default: 0

The maximum number of pages that can be stored in the local queue. This value must be positive.

Default: 4

How long to wait for the DataStax Bulk Loader coordinator to the first page.

Default: "60 seconds"

How long to wait for the DataStax Bulk Loader coordinator to send subsequent pages.

Default: "120 seconds"

Specify advanced options for using dsbulk with the DataStax Java driver. Use these options to define the contact points and port number for the initial connection. Additionally, define policy options pertaining to the DataStax Java driver load balancing policy settings, pooling options, query options, and socket connections.

The native protocol version to use. If not set, the DataStax Java driver looks up the versions of the nodes at startup (by default, system.peers.release_version) and chooses the highest common protocol version.

For example, if you have a mixed cluster with Apache Cassandra 2.1 nodes (protocol v3) and Apache Cassandra 3.0 nodes (protocol v3 and v4), the driver chooses protocol v3. If the nodes do not have a common protocol version, initialization fails. If this option is set, the given version is used for all connections without any negotiation or downgrading. If any of the contact points do not support the protocol version, that contact point is skipped. Once the protocol version is set, it cannot change for the duration of the driver's session. If an incompatible node joins the cluster later, the connection will fail and the driver will not try to reconnect to the node.

Default: null

The name of the algorithm used to compress protocol frames. Possible values are: lz4, snappy or none.

Default: none

The number of connections in the pool for nodes considered as local.

Default: 8

The number of connections in the pool for nodes considered as remote. The default load balancing policy used by DataStax Bulk Loader does not consider remote nodes. As a result, this setting has no effect when using the default load balancing policy.

Default: 8

The maximum number of requests that can be executed concurrently on a connection. Applies to local or remote connections. Must be a number between 1 and 32768.

Default: 32768

Whether to resolve the addresses passed to basic.contact-points.
• If true, addresses are created with InetSocketAddress(String, int). The host name is resolved the first time, and the driver will use the resolved IP address for all subsequent connection attempts.
• If false, addresses are created with InetSocketAddress.createUnresolved(). the host name will be resolved again every time the driver opens a new connection. This is useful for containerized environments where DNS records are more likely to change over time.
Note: JVM and OS have their own DNS caching mechanisms, so you might need additional configuration beyond the driver.
This option only applies to the contact points specified in the configuration. It has no effect on dynamically discovered peers. The driver relies on Cassandra system tables, which expose raw IP addresses. Use a custom address translator (see advanced.address-translator.class) to convert them to unresolved addresses; if you're in a containerized environment, you probably already need address translation.

Default: true

The class of the microsecond timestamp generator. If it is not qualified, the driver assumes that it resides in the package com.datastax.oss.driver.internal.core.time. The driver provides the following implementations out of the box:
• AtomicTimestampGenerator: timestamps are guaranteed to be unique across all client threads.
• ThreadLocalTimestampGenerator: timestamps that are guaranteed to be unique within each thread only.
• ServerSideTimestampGenerator: do not generate timestamps, let the server assign them.
You can also specify a custom class that implements TimestampGenerator and has a public constructor with two arguments: the DriverContext and a String representing the profile name.

Default: "AtomicTimestampGenerator"

The class of the translator. If not qualified, the DataStax Java driver assumes that it resides in the package com.datastax.oss.driver.internal.core.addresstranslation. The DataStax Java driver driver provides the PassThroughAddressTranslator implementation, which returns all addresses unchanged. You can also specify a custom class that implements AddressTranslator and has a public constructor with a DriverContext argument.

Default: "PassThroughAddressTranslator"

The heartbeat interval. If a connection stays idle for that duration (there are no reads), the DataStax Java driver sends a dummy message on it to make sure it's still alive. If not, the connection is closed and replaced.

Default: "30 seconds"

How long the DataStax Java driver waits for the response to a heartbeat. If this timeout occurs, the heartbeat is considered failed.

Default: "60 seconds"

## Deprecated options

--driver.timestampGenerator { AtomicMonotonicTimestampGenerator | ThreadLocalTimestampGenerator | ServerSideTimestampGenerator }

-lbp,--driver.policy.lbp.name { dse | dcAwareRoundRobin | roundRobin | whiteList | tokenAware }

--driver.policy.lbp.dcAwareRoundRobin.allowRemoteDCsForLocalConsistencyLevel {true | false}

--driver.policy.lbp.dcAwareRoundRobin.localDc string

--driver.policy.lbp.dcAwareRoundRobin.usedHostsPerRemoteDc number

--driver.policy.lbp.dse.childPolicy { dse | dcAwareRoundRobin | roundRobin | whiteList | tokenAware }

--driver.policy.lbp.tokenAware.childPolicy { dse | dcAwareRoundRobin | roundRobin | whiteList | tokenAware }

--driver.policy.lbp.tokenAware.shuffleReplicas { true | false }

--driver.policy.lbp.whiteList.childPolicy { dse | dcAwareRoundRobin | roundRobin | whiteList | tokenAware }

--driver.policy.lbp.whiteList.hosts string

--driver.pooling.heartbeat string

--driver.pooling.local.connections number

--driver.pooling.local.requests number

--driver.pooling.remote.requests number

--driver.protocol.compression string

--driver.query.idempotence {true | false}

--driver.query.serialConsistency string

-maxRetries,--driver.policy.maxRetries number

--driver.ssl.cipherSuites list

Deprecated. Instead use --datastax-java-driver.advanced.ssl-engine-factory.class and related datastax-java-driver.advanced.ssl-engine-factory.* options.

--driver.ssl.keystore.algorithm { SunX509 | NewSunX509 }

Deprecated. Instead use --datastax-java-driver.advanced.ssl-engine-factory.class and related datastax-java-driver.advanced.ssl-engine-factory.* options.

--driver.ssl.keystore.path string

--driver.ssl.openssl.keyCertChain string

Deprecated. Instead use --datastax-java-driver.advanced.ssl-engine-factory.class and related datastax-java-driver.advanced.ssl-engine-factory.* options.

--driver.ssl.openssl.privateKey string

Deprecated. Instead use --datastax-java-driver.advanced.ssl-engine-factory.class and related datastax-java-driver.advanced.ssl-engine-factory.* options.

--driver.ssl.provider { None | JDK | OpenSSL }

Deprecated. Instead use --datastax-java-driver.advanced.ssl-engine-factory.class and related datastax-java-driver.advanced.ssl-engine-factory.* options.

--driver.ssl.truststore.algorithm { PKIX | SunX509 }

Deprecated. Instead use --datastax-java-driver.advanced.ssl-engine-factory.class and related datastax-java-driver.advanced.ssl-engine-factory.* options.

--driver.ssl.truststore.path string

--driver.query.fetchSize number

-cl,--driver.query.consistency { ANY | LOCAL_ONE | ONE | TWO | THREE | LOCAL_QUORUM | QUORUM | EACH_QUORUM | ALL }

Deprecated. The consistency level to use for both loading and unloading. Note that stronger consistency levels usually result in reduced throughput In addition, any level higher than ONE will automatically disable continuous paging, which can dramatically reduce read throughput.

Default: LOCAL_ONE

--driver.auth.provider { None | PlainTextAuthProvider | DsePlainTextAuthProvider | DSEGSSAPIAuthProvider }
Deprecated. The name of the AuthProvider to use. Valid choices are:
• None: no authentication.

• PlainTextAuthProvider: Uses com.datastax.driver.core.PlainTextAuthProvider for authentication. Supports SASL authentication using the PLAIN mechanism (plain text authentication).

• DsePlainTextAuthProvider: Uses com.datastax.driver.dse.auth.DsePlainTextAuthProvider for authentication. Supports SASL authentication to DSE clusters using the PLAIN mechanism (plain text authentication), and also supports optional proxy authentication; should be preferred to PlainTextAuthProvider when connecting to secured DSE clusters.

• DseGSSAPIAuthProvider: Uses com.datastax.driver.dse.auth.DseGSSAPIAuthProvider for authentication. Supports SASL authentication to DSE clusters using the GSSAPI mechanism (Kerberos authentication), and also supports optional proxy authentication.
Note: When using this provider you may have to set the java.security.krb5.conf system property to point to your krb5.conf file (e.g. set the DSBULK_JAVA_OPTS environment variable to -Djava.security.krb5.conf=/home/user/krb5.conf). See the Oracle Java Kerberos documentation for more details.

Default: None

--driver.auth.authorizationId string
Deprecated. An authorization ID allows the currently authenticated user to act as a different user (proxy authentication). Providers that accept this setting:
• DsePlainTextAuthProvider
• DseGSSAPIAuthProvider

Default: unspecified

--driver.auth.keyTab string
Deprecated. The path of the Kerberos keytab file to use for authentication. If left unspecified, authentication uses a ticket cache. Providers that accept this setting:
• DseGSSAPIAuthProvider

Default: unspecified

--driver.auth.principal email
Deprecated. The Kerberos principal to use. For example, user@datastax.com. If left unspecified, the principal is chosen from the first key in the ticket cache or keytab. Providers that accept this setting:
• DseGSSAPIAuthProvider

Default: unspecified

--driver.auth.saslServicestring
Deprecated. The SASL service name to use. This value should match the username of the Kerberos service principal used by the DSE server. This information is specified in the dse.yaml file by the service_principal option under the kerberos_options section, and may vary from one DSE installation to another – especially if you installed DSE with an automated package installer. Providers that accept this setting:
• DseGSSAPIAuthProvider

Default: dse