Driver options

Driver options for the dsbulk command

Specify driver options /en/developer/driver-matrix/doc/common/driverMatrix.html for the dsbulk command.

The options can be used in short form (-k keyspace_name) or long form (--schema.keyspace keyspace_name).

General driver options

-h,--driver.hosts host_name(s)
The contact points to use for the initial connection to the cluster. This must be a comma-separated list of hosts, each specified by a host-name or IP address. If the host is a DNS name that resolves to multiple A-records, all the corresponding addresses will be used. Do not use localhost as a host-name (because it resolves to both IPv4 and IPv6 addresses on some platforms). The port for all hosts must be specified with driver.port.
Note: Be sure to enclose address strings that contain special characters in quotes, as shown in these examples:
dsbulk unload -h '["fe80::f861:3eff:fe1d:9d7a"]' -query "SELECT * from foo.bar;" 
dsbulk unload -h '["fe80::f861:3eff:fe1d:9d7b","fe80::f861:3eff:fe1d:9d7c"]' 
              -query "SELECT * from foo1.bar1;"

Default: 127.0.0.1

-port,--driver.port port_number

The port to connect to at initial contact points. Note that all nodes in a cluster must accept connections on the same port number.

Default: 9042

--driver.addressTranslator string

The simple or fully-qualified class name of the address translator to use. This is only needed if the nodes are not directly reachable from the machine on which dsbulk is running (for example, the dsbulk machine is in a different network region and needs to use a public IP, or it connects through a proxy).

Default: IdentityTranslator

--driver.timestampGenerator ( AtomicMonotonicTimestampGenerator | ThreadLocalTimestampGenerator | ServerSideTimestampGenerator )
The simple or fully-qualified class name of the timestamp generator to use. Built-in options are:
  • AtomicMonotonicTimestampGenerator: timestamps are guaranteed to be unique across all client threads.
  • ThreadLocalTimestampGenerator: timestamps are guaranteed to be unique within each thread only.
  • ServerSideTimestampGenerator: do not generate timestamps, let the server assign them.

Default: AtomicMonotonicTimestampGenerator

Driver policies

Driver policy options pertain to DSE Java Driver load balancing policy settings. See Load Balancing for details.
-lbp,--driver.policy.lbp.name ( dse | dcAwareRoundRobin | roundRobin | whiteList | tokenAware )
The name of the load balancing policy. Supported policies include: dse, dcAwareRoundRobin, roundRobin, whiteList, tokenAware. Available options for the policies are listed below as appropriate. For more information, refer to the driver documentation for the policy. If not specified, defaults to the driver's default load balancing policy, which is currently the DseLoadBalancingPolicy wrapping a TokenAwarePolicy, wrapping a DcAwareRoundRobinPolicy.
Note: It is critical for a token-aware policy to be used in the chain in order to benefit from batching by partition key.

Default: unspecified

--driver.policy.lbp.dcAwareRoundRobin.allowRemoteDCsForLocalConsistencyLevel ( true | false )

Enable or disable whether to allow remote datacenters to count for local consistency level in round robin awareness. Deprecated in DataStax Bulk Loader 1.3.0 or later.

Default: false

--driver.policy.lbp.dcAwareRoundRobin.localDc string

The datacenter name (commonly dc1, dc2, etc.) local to the machine on which dsbulk is running, so that requests are sent to nodes in the local datacenter whenever possible.

Default: unspecified

--driver.policy.lbp.dcAwareRoundRobin.usedHostsPerRemoteDc number

The number of hosts per remote datacenter that the round robin policy should consider. Deprecated in DataStax Bulk Loader 1.3.0 or later.

Default: 0

--driver.policy.lbp.dse.childPolicy ( dse | dcAwareRoundRobin | roundRobin | whiteList | tokenAware )

The child policy that the specified dse policy wraps.

Default: roundRobin

--driver.policy.lbp.tokenAware.childPolicy ( dse | dcAwareRoundRobin | roundRobin | whiteList | tokenAware )

The child policy that the specified tokenAware policy wraps.

Default: roundRobin

--driver.policy.lbp.tokenAware.shuffleReplicas ( true | false )

Specify whether to shuffle the list of replicas that can process a request. For loading, shuffling can improve performance by distributing writes across nodes.

Default: true

--driver.policy.lbp.whiteList.childPolicy ( dse | dcAwareRoundRobin | roundRobin | whiteList | tokenAware )

The child policy that the specified whiteList policy wraps.

Default: roundRobin

--driver.policy.lbp.whiteList.hosts string

List of hosts to white list. This must be a comma-separated list of hosts, each specified by a host-name or ip address. If the host is a DNS name that resolves to multiple A-records, all the corresponding addresses will be used. Do not use localhost as a host-name (because it resolves to both IPv4 and IPv6 addresses on some platforms).

Default: unspecified

-maxRetries,--driver.policy.maxRetries number

Maximum number of retries for a timed-out request.

Default: 10

Driver pooling

--driver.pooling.heartbeat string

The heartbeat interval. If a connection stays idle for that duration (no reads), the driver sends a dummy message on it to make sure it's still alive. If not, the connection is trashed and replaced.

Default: 30 seconds

--driver.pooling.local.connections number

The number of connections in the pool for nodes at "local" distance.

Default: 4

--driver.pooling.local.requests number

The maximum number of requests (1 to 32768) that can be executed concurrently on a connection. If connecting to legacy clusters using protocol version 1 or 2, any value greater than 128 will be capped at 128 and a warning will be logged.

Default: 32768

--driver.pooling.remote.connections number

The number of connections in the pool for remote nodes.

Default: 1

--driver.pooling.remote.requests number

The maximum number of requests (1 to 32768) that can be executed concurrently on a connection. If connecting to legacy clusters using protocol version 1 or 2, any value greater than 128 will be capped at 128 and a warning will be logged.

Default: 1024

Driver compression protocol

--driver.protocol.compression ( NONE | LZ4 | SNAPPY )

Specify the compression algorithm to use.

Default: NONE

Driver query

-cl,--driver.query.consistency ( ANY | LOCAL_ONE | ONE | TWO | THREE | LOCAL_QUORUM | QUORUM | EACH_QUORUM | ALL )

The consistency level to use for both loading and unloading. Note that stronger consistency levels usually result in reduced throughput In addition, any level higher than ONE will automatically disable continuous paging, which can dramatically reduce read throughput.

Default: LOCAL_ONE

--driver.query.fetchSize number

The page size, or how many rows will be retrieved simultaneously in a single network round trip. This setting will limit the number of results loaded into memory simultaneously during unloading. Setting this value to any negative value will disable paging, and the entire result set will be retrieved in one pass (not recommended). When connecting to legacy clusters with protocol version 1, paging will be automatically disabled and a warning will be logged because paging is unsupported. This setting applied to paging for regular queries; for continuous queries, see executor.continuousPaging.pageSize. Not applicable for loading.

Default: 5000

--driver.query.idempotence ( true | false )

The default idempotence of statements generated by the loader.

Default: true

--driver.query.serialConsistency ( SERIAL | LOCAL_SERIAL )

The serial consistency level to use for writes. Only applicable if the data is inserted using lightweight transactions, ignored otherwise.

Default: LOCAL_SERIAL

Driver socket

--driver.socket.readTimeout string

The time the driver waits for a request to complete. This is a global limit on the duration of a session.execute() call, including any internal retries the driver might do.

Default: 60 seconds