Logging options

Specify logging and error options for the dsbulk command. Log messages are only logged to the main log file, operation.log, and standard error, and nothing is printed to stdout.

The options can be used in short form (-k keyspace_name) or long form (--schema.keyspace keyspace_name).

-ansiMode, --log.ansiMode, --dsbulk.log.ansiMode { normal | force | disable }

Whether to use ANSI colors and other escape sequences in log messages printed to the console. By default, dsbulk uses colored output (normal) when the terminal is:

  1. compatible with ANSI escape sequences; all common terminals on *nix and BSD systems, including MacOS, are ANSI-compatible, and some popular terminals for Windows (Mintty, MinGW) or

  2. a standard Windows DOS command prompt (ANSI sequences are translated on the fly).

The force value causes dsbulk to use ANSI colors even for non ANSI-compatible terminals detected. There should be no reason to disable ANSI escape sequences, but if, for some reason, colored messages are not desired or not printed correctly, this option allows disabling ANSI support altogether.

For Windows: ANSI support works best with the Microsoft Visual C++ 2008 SP1 Redistributable Package installed.

Default: normal

-log.checkpoint.enabled, --dsbulk.log.checkpoint.enabled boolean

Enable the ability to checkpoint the current operation. The default setting of true allows DataStax Bulk Loader to track records that were processed and produce a checkpoint file at the end of the operation. Use this checkpoint file to resume the same operation if all records were not processed or if the operation was interrupted.

A checkpointed operation consumes more memory and is slightly slower. Disable checkpointing if it is not needed.

Default: true

-log.checkpoint.file, --dsbulk.log.checkpoint.file string

The path to a checkpoint file from which to resume an operation. If this option is set, and depending on the replay strategy, then only unprocessed data is handled or failed data is re-processed, or both.

When using a checkpoint file to resume an operation, ensure that loading and unloading operations target the same dataset:

  • When loading, make sure that the files to load were not renamed or moved. If they were, then all files are considered new and loaded in their entirety. In addition, if file contents were changed, then new records may go unnoticed or cause other records to be processed twice.

  • When unloading, make sure that the read query, the token distribution across the ring, the number of --schema.splits, and the data to read are identical across operations. Otherwise, the unloaded data could be inconsistent.

Default: null.

-log.checkpoint.replayStrategy, --dsbulk.log.checkpoint.replayStrategy string

The replay strategy to use when resuming an operation from a checkpoint file. Valid values are:

  • resume: DSBulk only processes new records from resources that were not consumed entirely. Records that were already processed are ignored, including rejected ones (rejected records are always written to bad files). This is the safest option when loading if the operation is not idempotent.

  • retry: This is the default option. DSBulk processes new and rejected records from resources that were not entirely consumed.

    This strategy may result in some rows being inserted twice and thus should only be used if the operation is idempotent.

  • retryAll: Similar to retry, DSBulk processes new and rejected records, but unlike retry, it processes all resources, including those marked as consumed entirely.

    This strategy may result in some rows being inserted twice and therefore should only be used if the operation is idempotent.

Default: retry.

-logDir, --log.directory, --dsbulk.log.directory path_to_directory

The writable directory where all log files are stored; if the directory specified does not exist, it is created. URLs are not acceptable (not even file:/ URLs). Log files for a specific run, or execution, are located in a sub-directory under the specified directory. Each execution generates a sub-directory identified by an "execution ID". See engine.executionId for more information about execution IDs. Relative paths are resolved against the current working directory. Also, for convenience, if the path begins with a tilde (~), that symbol is expanded to the current user’s home directory.

Default: ./logs

-maxErrors, --log.maxErrors, --dsbulk.log.maxErrors { number | "N%" }

The maximum number of errors to tolerate before aborting the entire operation. Set to either a number or a string of the form N% where N is a decimal number between 0 and 100. Setting this value to -1 disables this feature (not recommended).

Default: 10

--log.sources, --dsbulk.log.sources boolean

Whether to print record sources in debug files. When set to true (the default), debug files contain — for each record that failed to be processed — its original source, such as the text line that the record was parsed from.

When loading, enabling this option also enables the creation of so-called "bad files." That is, files containing the original lines that could not be inserted. These files could then be used as the data source of a subsequent load operation that would load only the failed records.

This feature is useful to locate failed records more easily and diagnose processing failures — especially if the original data source is a remote one, such as an FTP or HTTP URL.

For this feature to exist, record sources must be kept in memory until the record is fully processed. For large record sizes (over 1 megabyte per record), retaining record sources in memory could put a high pressure on the JVM heap, thus exposing the operation to out-of-memory errors. This phenomenon is exacerbated when batching is enabled. If you are experiencing such errors, consider disabling this option.

DataStax Bulk Loader® always prints the record’s resource, which is the file name or the database table from whence the record came. Also, it always prints the record’s position, which is the ordinal position of the record inside the resource, when available. For example, the information could be the record’s line number in a CSV file.

Default: true

--log.stmt.level, --dsbulk.log.stmt.level { ABRIDGED | NORMAL | EXTENDED }

The desired log level for printing to log files.

Valid values are:

  • ABRIDGED: Print only basic information in summarized form.

  • NORMAL: Print basic information in summarized form, and the statement’s query string, if available. For batch statements, this verbosity level also prints information about the batch’s inner statements.

  • EXTENDED: Print full information, including the statement’s query string, if available, and the statement’s bound values, if available. For batch statements, this verbosity level also prints all information available about the batch’s inner statements.

Default: EXTENDED

--log.stmt.maxBoundValueLength, --dsbulk.log.stmt.maxBoundValueLength number

The maximum length for a bound value. Bound values longer than this value are truncated.

Setting this value to -1 disables this feature (not recommended).

Default: 50

--log.stmt.maxBoundValues, --dsbulk.log.stmt.maxBoundValues number

The maximum number of bound values to print. If the statement has more bound values than this limit, the exceeding values are not printed.

Setting this value to -1 disables this feature (not recommended).

Default: 50

--log.stmt.maxInnerStatements, --dsbulk.log.stmt.maxInnerStatements number

The maximum number of inner statements to print for a batch statement. Only applicable for batch statements, ignored otherwise. If the batch statement has more children than this value, the exceeding child statements are not printed.

Setting this value to -1 disables this feature (not recommended).

Default: 10

--log.stmt.maxQueryStringLength, --dsbulk.log.stmt.maxQueryStringLength number

The maximum length for a query string. Query strings longer than this value are truncated.

Setting this value to -1 disables this feature (not recommended).

Default: 500

-verbosity, --log.verbosity, --dsbulk.log.verbosity { 0 | 1 | 2 }

Desired level of verbosity. Valid values are:

  • 0 (quiet): Only log WARN and ERROR messages.

  • 1 (normal): Log INFO, WARN and ERROR messages.

  • 2 (verbose) Log DEBUG, INFO, WARN and ERROR messages.

Default: 1

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com