DataStax Bulk Loader release notes

The DSBulk release notes describe enhancements and changes in each release.

Version 1.11 and later

For all release notes, see the DSBulk GitHub repository.

Released July 13, 2023, DSBulk 1.11 adds support for the vector data type.

Earlier releases

2022 releases
1.10 (August 2022)

Added the ability to resume a failed operation by using checkpoint files. Three new options are available to use:

1.9.1 (August 2022)

Adjusted the default throughput from DSBulk to Astra DB to a more conservative setting in order to avoid triggering a rate limit exception.

You can adjust the client-side rate limit through either a configuration file or via the --engine.maxConcurrentQueries command line interface (CLI) option.

1.9.0 (April 2022)
  • At the conclusion of a dsbulk run, results are printed from the Count workflow even if failures occur. In general, when querying billions of rows, some failures are expected. However, reporting these failures may indicate whether there is need to retry the dsbulk run command. (BULK-18)

  • Upgraded driver to 4.14.0.

  • When issuing a BATCH query to load data, DSBulk unwraps the statements of this query and incorporates them into its own batching function, creating a consolidated BATCH message. This avoids nesting BATCH statements inside protocol-level BATCH messages, which is forbidden. A consolidated BATCH message also greatly improves performance when loading with timestamp and TTL preservation enabled. (BULK-23)

  • Added support for Prometheus. (BULK-26)

  • When unloading data using the options -timestamp or -ttl (to automatically preserve cell timestamp and time-to-live (TTL)), the operation fails if the table being unloaded contains collections. Excluded unsupported types from an automatic timestamp and TTL unload, and log a warning explaining that some timestamps and TTLs may be lost. (BULK-25)

  • DSBulk distribution archives are now uploaded to Maven Central. (BULK-24)

  • Removed the check for a primary key column in a record that is empty. This allows the server to handle BLOB types with empty buffers in any primary key column, as well as to handle a composite partition key that accepts empty blobs or strings. (BULK-28)

  • Added support for nested functions in DSBULK mappings. (BULK-29)

  • Added support for literal strings in mappings outside of function arguments. (BULK-30)

2021 releases
1.8.0 (March 2021)
  • Support for Well-known Binary (WKB) geometry data:

    • When loading Geo data, in addition to existing support for Well-known Text (WKT) and GeoJson (JSON) data formats, DataStax Bulk Loader now also accepts WKB data.

    • Use the existing --codec.binary setting to encode any WKB data in HEX or BASE64.

    • A new setting, --codec.geo, has been added to declare the strategy used when converting geometry types to strings. This setting is also used whe unloading Geo data to set the desired output format.

  • Added options to automatically preserve Time-To-Live (TTL) and timestamps:

    • For query generation, two new settings that allow for the transparent handling of cell timestamps and TTL:

      • schema.preserveTimestamp: when true, timestamps are preserved when loading and unloading. Default is false. See schema.preserveTimestamp

      • schema.preserveTtl: when true, TTLs are preserved when loading and unloading. Default is false. See schema.preserveTtl.

      • These settings work best when DataStax Bulk Loader is responsible for generating the queries. DataStax Bulk Loader will generate special queries that export and import all the required data. Overall, the new feature allows a table to be exported, then imported, while preserving all timestamps and TTLs; the heavy-lifting of generating the appropriate queries is performed entirely by DataStax Bulk Loader.

    • Some changes were also made to the schema.mapping grammar, to allow individual cell timestamps and TTLs to be easily mapped in user-supplied mappings:

      • When unloading, nothing changes in fact; the usual way to export timestamps and ttls remains the usage of the writetime and ttl functions, applied to a given column. For example, to export one column, its timestamp and its TTL to three different fields, you could use the following mapping: field1 = col1, field2 = writetime(col1), field3 = ttl(col1).

      • When loading however, it is now also possible to use the same writetime and ttl functions to map a field’s value to the timestamp or TTL of one or more columns in the table. For example, the following mapping would use field3’s value as the writetime of columns col1 and col2, and field4’s value as the TTL of those columns: field1 = col1, field2 = col2, field3 = writetime(col1,col2), field4 = ttl(col1,col2).

      • As a shortcut, when loading, you can also use the special syntax writetime() and ttl(): they mean the timestamp and TTL of all columns in the row, except those already mapped elsewhere. For example, the following mapping would use field4’s value as the timestamp of column col1, and field4’s value as the timestamp of all remaining columns: that is, columns col2 and col3: field1 = col1, field2 = col2, field3 = col3, field4 = writetime(col1), field5 = writetime(*)

  • Breaking changes:

    • Starting with this release, when unloading data with Geo types to JSON files, all Geo data is now encoded in WKT data format by default, instead of GeoJson. To achieve the pre-1.8.0 behavior, set the --codec.geo JSON option.

    • Starting with this release, the special tokens \\__timestamp and \\__ttl are deprecated but still honored. If used, a warning message is logged. When you can, replace any \\__timestamp and \\__ttl tokens with writetime (*) and ttl(*), respectively.

  • Fixed issues:

    • Introduction of --codec.geo fixed issues identified with Geo data. An example: Could not deserialize column s_geo of type Set(Custom(org.apache.cassandra.db.marshal.PointType), not frozen) as java.lang.String was returned while unloading/loading a table with PointType.

    • Corrected several occurrences of a documentation typo that previously showed --stats.mode. The correct option for the dsbulk count command is --stats.modes or --dsbulk.stats.modes. Meaning, the plural form is correct. See Count options.

2020 releases
1.7.0 (September 2020)
  • Introduced a new setting --log.sources, --dsbulk.log.sources boolean that determines whether to print record sources in debug files, and enable "bad files" with load operations. See --dsbulk.log.sources.

  • Introduced a new setting --monitoring.console, --dsbulk.monitoring.console boolean to enable or disable console reporting. See --dsbulk.monitoring.console.

  • A clarification regarding writing empty strings as quoted empty fields: To insert an empty string, with the intention to override a value that existed previously in the given column, you can insert an empty quoted field in the data file, as in this CSV example:

    foo,"",bar

    This example inserts an empty string into the column mapped to the second field, using the default settings. For this scenario, do not set --connector.csv.nullValue '""' because this setting is for empty non-quoted fields.

  • Fixed null pointer exception when reading from Stdin.

1.6.0 (July 2020)
  • Starting with version 1.6.0, DataStax Bulk Loader is available under the Apache-2.0 license as open-source software (OSS). This change and enhancement makes it possible for the open-source community of developers to contribute features that enable loading and unloading CSV/JSON data to and from Cassandra, DataStax Enterprise (DSE), and Astra DB databases.

  • The product’s official name is now DataStax Bulk Loader.

  • The public GitHub repo is https://github.com/datastax/dsbulk.

  • Some features are specific to DSE environments, including parameters associated with DSE Graph. In topics such as Schema options, Graph-only features are highlighted with an icon.

  • Ability to specify a list of allowed or denied hosts in the driver’s load balancing policy with the datastax-java-driver.basic.load-balancing-policy.filter.allow and datastax-java-driver.basic.load-balancing-policy.filter.deny options. Load balancing policies aren’t applicable to Astra DB because the load balancing policy is set in the Secure Connect Bundle (SCB).

  • Introduced a new setting in Engine Options dsbulk.engine.maxConcurrentQueries that regulates the number of queries executed in parallel. It applies to all types of dsbulk operations (load, unload, count). The setting requires a valid integer value greater than zero. The NC notation is also possible, for example, 2C means twice the number of cores.

  • The setting executor.continuousPaging.maxConcurrentQueries is deprecated. Instead use --dsbulk.engine.maxConcurrentQueries.

  • In prior releases, the connector.csv.maxConcurrentFiles and connector.json.maxConcurrentFiles settings were for dsbulk unload only. In 1.6.0, you can also use them with dsbulk load, which sets the maximum number of files that can be read in parallel. For important considerations, see connector.csv|json.maxConcurrentFiles.

  • DataStax Bulk Loader accepts binary input in the following formats: BASE64 or HEX. For dsbulk unload only, you can choose the format when converting binary data to strings. See codec.binary in Codec options.

  • Raised the driver default timeouts to 5 minutes for the following:

    • datastax-java-driver.basic.request.timeout="5 minutes"

    • datastax-java-driver.advanced.continuous-paging.timeout.first-page="5 minutes"

    • datastax-java-driver.advanced.continuous-paging.timeout.other-pages="5 minutes"

  • Added support for multi-character delimiters. For example, to load a CSV file with '||', add a -delim '\|\|' parameter on the dsbulk load command. The resulting CSV format is as follows:

    Foo||bar||qix
  • When providing a custom query to dsbulk count, only global is accepted for stats.modes. The query is executed "as is." See Count options.

  • The ORDER BY, GROUP BY and LIMIT clauses now cause the query to be executed "as is," without parallelization. See Schema options.

  • Fixed an issue where a zero-length array was exported as "" (empty string). The issue was that an empty string, when reimported to a blob column, was interpreted as null, instead of as a zero-length array.

  • Per-file limits, maxRecords and skipRecords for CSV and JSON data, were not applied when there was more than one file to read.

1.5.0 (March 2020)
  • Previous DataStax Bulk Loader releases included support for loading and unloading graph data using prior settings and workflows. Starting with this DataStax Bulk Loader 1.5.0 release, the product provides an improved user experience to set related DSE Graph (DSG) properties, plus enhanced validation of requested DSG operations. The changes include new DataStax Bulk Loader schema settings that are specific to DSG operations. The new options are:

    • -g, --schema.graph, --dsbulk.schema.graph string

    • -e, --schema.edge, --dsbulk.schema.edge string

    • -v, --schema.vertex, --dsbulk.schema.vertex string

    • -from, --schema.from, --dsbulk.schema.from string

    • -to, --schema.to, --dsbulk.schema.to string For details, refer to Schema options.

  • With the enhanced support for DSG features, DataStax Bulk Loader displays metrics for graph data in vertices per second or edges per second. For non-graph data, the metrics continue to be displayed in rows per second. The metrics display type occurs automatically based on how the table DDL was created and whether DataStax Bulk Loader schema options include graph settings; that is, you do not need to configure graph-specific monitoring options for the metrics.

  • DataStax Bulk Loader 1.5.0 adds support for release 4.5.0 of the DataStax Java driver.

  • Addressed an issue with the deserialization of untrusted data by upgrading to the latest 2.9.10 release of the FasterXML jackson-databind library, although DataStax Bulk Loader was not directly affected.

2019 releases
1.4.1 (December 2019)
  • DataStax Bulk Loader 1.4.1 adds support for using the dsbulk load command to write CSV/JSON data to Cassandra 2.1 and later database tables. Previously, you could only use dsbulk unload and dsbulk count commands with Cassandra.

  • When exporting data, the \u0000 null character is now enclosed in quotes, so that the exported data can be loaded subsequently with the same DataStax Bulk Loader settings. By default, the null character is used as the comment character.

1.4.0 (November 2019)
  • DataStax Bulk Loader 1.4.0 has been upgraded to use the latest 2.x version of the DataStax Java driver, and many new driver options are available directly with dsbulk commands via the datastax-java-driver prefix.

    Before upgrading to DataStax Bulk Loader 1.4.0, note that as a result of the driver enhancements, this release supports DSE 4.7 and later and Cassandra 2.1 and later. Prior releases of DSE and Cassandra are not supported. If you are using earlier releases of DSE or Cassandra, you must remain on DataStax Bulk Loader 1.3.4.

    For details about the DataStax Java driver enhancements, see Driver options (which includes security options) and the Executor options. Several of the Driver and Executor options have been deprecated and replaced by settings that use the datastax.java.driver prefix. Those prior options are still supported, but may not be supported in a subsequent release. When you can, review and adjust your command scripts and configuration files to take advantage of the new options that use the datastax-java-driver prefix.

  • You can connect DataStax Bulk Loader to an Astra DB database by including the path to the database’s Secure Connect Bundle and providing an application token for the password. For examples, see dsbulk load.

  • You can use DataStax Bulk Loader to load/unload your table data from/to compressed CSV or JSON files. For details, see the --connector.csv|json.compression option.

1.3.4 (July 2019)
  • The DataStax Bulk Loader Help provides an entry for --version.

  • Improved error message provided when a row fails to decode. In the DataStax Bulk Loader logging options, the format is: -maxErrors,--log.maxErrors ( number | "N%" ). Where number | N% is the maximum number of errors to allow before aborting the entire operation. This setting may be expressed as an absolute number of errors (an integer greater than or equal to zero) or a percentage of the total rows processed so far (a string of the form "N%", where N is a decimal number between 0 and 100 exclusive). For example, -maxErrors "20%". Setting this value to any negative integer disables the feature, which is not recommended.

  • When a table contains static columns, it is possible that some partitions only contain static data. In this case, that data is exported as a pseudo row where all clustering columns and regular columns are null. Example:

    create table t1 (pk int, cs int static, cc int, v int, primary key (pk, cc));
    insert into t1 (pk, cs) values (1,1);
    select * from t1;
     pk | cc   | cs | v
    ----+------+----+------
      1 | null |  1 | null

    In prior DataStax Bulk Loader releases, you could not import this type of static data, even though the query was valid. For example, the following query was rejected:

    INSERT INTO t1 (pk, cs) values (:pk, :cs);
    Operation LOAD_20190412-134352-912437 failed: Missing required primary key column conversation_id
    from schema.mapping or schema.query.

    DataStax Bulk Loader now allows this valid query.

  • You can use the CQL date and time types with UNITS_SINCE_EPOCH, in addition to timestamp. Previously, you could only use the CQL timestamp type. On the dsbulk command, you can use codec.unit and codec.epoch to convert integers to, or from, these types. Refer to --codec.unit, --dsbulk.codec.unit and --codec.epoch, --dsbulk.codec.epoch.

  • You can use a new monitoring setting, monitoring.trackBytes, to enable or disable monitoring of DataStax Bulk Loader operations in bytes per second. Because this type of monitoring can consume excessive allocation resources, and in some cases excessive CPU cycles, the setting is disabled by default. If you want monitoring in bytes per second, you must enable it with monitoring.trackBytes. Test and compare the setting in your development environment. Disabling this setting may improve the allocation rate. If leaving the setting disabled improves throughput, consider disabling it (or keeping it disabled) in production. Or enable monitoring of bytes per second on an as-needed basis.

  • The default output file name format, defined by the --connector.csv|json.fileNameFormat string option, no longer includes the thousands separator. The prior default output file name format was:

    • output-%0,6d.csv

    • output-%0,6d.json The updated default format is:

    • output-%06d.csv

    • output-%06d.json

  • In load operations, you can pass the URL of a CSV or JSON data file. In cases where you have multiple URLs, DataStax Bulk Loader 1.3.4 makes this task easier by providing the following command-line options. You can point to the file that contains all the URLs for the data files:

  • When using REPLICA_SET batch mode, the server may issue query warnings if the number of statements in a single batch exceeds unlogged_batch_across_partitions_warn_threshold. To avoid reporting excessive warning messages in stdout, DataStax Bulk Loader logs only one warning at the beginning of the operation.

  • DataStax Bulk Loader should reject CSV files containing invalid headers, such as headers that are empty or contain duplicate fields.

  • Logging option -maxErrors 0 does not abort the operation.

  • DataStax Bulk Loader should reject invalid execution IDs. An execution ID is used to create MBean names. DataStax Bulk Loader now validates user-provided IDs to ensure, for example, that an ID does not contain a comma.

1.3.3 (March 2019)
  • Previously, exports of varchar column containing JSON could truncate data. Columns of type varchar that contain JSON are now exported "as is," meaning DataStax Bulk Loader does not attempt to parse the JSON payload.

    For example, assume you had a column col1 whose value was '{"foo":42}'. This was previously exported as col1 = {"foo":42}, where the contents of the column were parsed into a JSON node. In DataStax Bulk Loader 1.3.3, the JSON {"foo":42} in a varchar column is exported as a string: col1 = "{\"foo\":42}".

1.3.2 (February 2019)
  • Print basic information about the cluster.

  • Unload timestamps as units since an epoch.

    Datasets containing numeric data that are intended to be interpreted as units since a given epoch require the setting codec.timestamp=UNITS_SINCE_EPOCH. Failing to specify this special format will result in all records being rejected due to an invalid timestamp format. Refer to Codec options.

  • Provide better documentation on how to choose the best batching strategy.

  • Implement unload and count for materialized views.

  • Calculate batch size dynamically with adaptive batch sizing. The new setting batch.maxSizeInBytes defaults to -1 (unlimited).

    batch.maxBatchSize is deprecated; instead, use batch.maxBatchStatements.

  • batch.bufferSize should be a multiple of batch.maxBatchStatements. By default batch.bufferSize is set to 4 times batch.maxBatchStatements if its value is less than or equal to 0.

  • Improve support for lightweight transactions. DataStax Bulk Loader can detect write failures due to a failed LWT write. Records that could not be inserted will appear in two new files:

    • paxos.bad is a new "bad file" devoted to LWT write failures.

    • paxos-errors.log is a new debug file devoted to LWT write failures.

      DataStax Bulk Loader also writes any records from failed writes to a .bad file in the operation’s directory, depending on when the failure occurred.

  • Extend DataStax Bulk Loader rate limiting capability to reads. Previously, the rate limiter used by DataStax Bulk Loader, and adjustable via the --executor.maxPerSecond setting, only applied to writes. DataStax Bulk Loader extends the functionality to reads by making it consider the number of received rows instead of the number of requests sent.

  • Expose settings to control how to interpret empty fields in CSV files. There are two new settings for the CSV connector: nullValue and emptyValue. Previously when reading a CSV file, the connector would emit an empty string when a field was empty and non-quoted. By default, starting with DataStax Bulk Loader 1.3.2, the CSV connector will return a null value in such situations, which may not make a difference in most cases. The only noticeable difference will be for columns of type VARCHAR or ASCII; the resulting stored value will be null instead of an empty string.

  • Allow functions to appear in mapping variables. Previously for loads only, a mapping entry could contain a function on the right side of the assignment. This functionality has been extended to unloads. For example, loads may continue to use now() = column1, and the result of the now() function is inserted into column1 for every row. For unloads, you can export the result of now() as fieldA for every row read, such as fieldA = now().

  • Detect writetime variable when unloading. You can specify a writetime function in a mapping definition when unloading. For example, if fieldA = column1, fieldB = writetime(column1), because the data type is detected, fieldB is exported as a timestamp, not as an integer.

  • Relax constraints on queries for the Count workflow. The schema.query setting can contain any SELECT clause when counting rows.

  • Automatically add token range restriction to WHERE clauses. When a custom query is provided with --schema.query, to enable read parallelization, it is no longer necessary to provide a WHERE clause using the form WHERE token(pk) > :start AND token(pk) ⇐ :end. If the query does not contain a WHERE clause, DataStax Bulk Loader automatically generates that WHERE clause. However, if the query contains a WHERE clause, DataStax Bulk Loader is not be able to parallelize the read operations.

  • Allow JSON array mapping with UDTs. Previously, when loading User Defined Types (UDTs) it was required that the input be a JSON object to allow for field-by-field mapping. Starting with DataStax Bulk Loader 1.3.2, a JSON array can also be mapped to UDTs, in which case the mapping is based on field order.

  • Improve WHERE clause token range restriction detection. When you provide a custom query for unloading, the token range restriction variables can have any name, not only start and end. For example, the following is valid: SELECT * FROM table1 WHERE token(pk) > :foo AND token(pk) ⇐ :bar

  • Remove record location URI. DataStax Bulk Loader previously provided a record’s URI to uniquely identify the record. However, the URI was very long and difficult to read. You can instead identify a failed record by looking into the record’s source statement or row.

  • Allow columns and fields to be mapped more than once. It is possible to map a field/column more than once. The following rules apply:

    • When loading, a field can be mapped to 2 or more columns, but a column cannot be mapped to 2 or more fields. Thus the following mapping is correct: fieldA = column1, fieldA = column2.

    • When unloading, a column can be mapped to 2 or more fields, but a field cannot be mapped to 2 or more columns. Thus the following mapping is correct: fieldA = column1, fieldB = column1.

  • UDT and tuple codecs should respect allowExtraFields and allowMissingFields.

    The settings schema.allowMissingFields and schema.allowExtraFields apply to UDTs and tuples. For example, if a tuple has three elements, but the JSON input only has two elements, this scenario results in an error if schema.allowMissingFields is false. However, this scenario is accepted if schema.allowMissingFields is true. The missing element in this example is assigned as null.

  • Add support for DSE 4.8 and lower. All protocol versions are supported. Some features might not be available depending on the protocol version and server version. The schema.splits (default: 8C) setting was added to compensate for the absence of paging in C* 1.2. The token ring is split into small chunks and is controlled by this setting. For example:

    bin/dsbulk unload -url myData.csv --driver.pooling.local.connections 8 \
      --driver.pooling.local.requests 128 --driver.pooling.remote.requests 128 \
      --schema.splits 0.5C -k test -t test

    On --schema.splits, you can optionally use special syntax, nC, to specify a number that is a multiple of the available cores, resulting in a calculated number of splits. If the number of cores is 8, --schema.splits 0.5C = 0.5 * 8, which results in 4 splits. Refer to --schema.splits, --dsbulk.schema.splits number.

  • Add support for keyspace-qualified UDFs in mappings. If needed, you can qualify a user-defined function (UDF) with a keyspace name. For example: fieldA = ks1.func1(column1, column2).

  • Allow fields to appear as function parameters on the left side of mapping entries. When loading, a mapping entry can contain a function on the left side that references fields of the dataset. For example, consider the case where:

    • A dataset has two fields, fieldA and fieldB

    • A table with three columns: colA, colB and colSum

    • A user-define function: sum(int, int) The following mapping works:

      The following mapping will store the sum of fieldA and fieldB into colSum: fieldA = colA, fieldB = colB, sum(fieldA,fieldB)=colSum.

  • Improve handling of search queries. You can supply a DSE search predicate using the solr_query mechanism. For example, assume you create a search index on the dsbulkblog.iris_with_id table:

    cqlsh -e "CREATE SEARCH INDEX IF NOT EXISTS ON dsbulkblog.iris_with_id"

    You can issue a query for just the Iris-setosa rows:

    dsbulk unload -query "SELECT id, petal_length, petal_width, \
     sepal_length, sepal_width, species FROM dsbulkblog.iris_with_id \
      WHERE solr_query = '{\\\"q\\\": \\\"species:Iris-setosa\\\"}'"
  • Ability to hard-limit the number of concurrent continuous paging sessions. DataStax Bulk Loader adds a new setting: executor.continuousPaging.maxConcurrentQueries (Default: 60). It sets the maximum number of concurrent continuous paging queries that should be carried in parallel. Set this number to a value equal to or less than the value configured server-side for continuous_paging.max_concurrent_sessions in the cassandra.yaml configuration file, which is also 60 by default. Otherwise some requests may be rejected. You can disable executor.continuousPaging.maxConcurrentQueries by assigning any negative value or 0.

  • Ability to skip unloading or loading the solr_query column. DataStax Bulk Loader will skip the solr_query column when loading and unloading.

  • Setting executor.maxInFlight to a negative value triggers fatal error.

  • Murmur3TokenRangeSplitter should allow long overflows when splitting ranges.

  • CSV connector trims trailing white space when reading data.

  • Avoid overflows in CodecUtils.numberToInstant.

  • Call to ArrayBackedRow.toString() causes fatal NPE.

2018 releases
1.2.0 (August 2018)
  • Improve range split algorithm in multi-DC and vnodes environments.

  • Support simplified notation for JSON arrays and objects in collection fields.

  • CSVWriter trims leading/trailing whitespace in values.

  • CSV connector fails when the number of columns in a record is greater than 512.

  • DSBulk fails when mapping contains a primary key column mapped to a function.

1.1.0 (June 2018)
  • Combine batch.mode and batch.enabled into a single setting: batch.mode. If you are using the batch.enabled setting in scripts, change to batch.mode with value DISABLED.

  • Improve handling of Univocity exceptions.

  • Logging improvements:

    • Log messages are logged only to operation.log. Logging does not print to stdout.

    • Configurable logging levels with the log.verbosity setting.

    • The setting log.ansiEnabled is changed to log.ansiMode.

  • New count workflow:

    • Supports counting rows in a table.

    • Configurable counting mode.

    • When mode = partitions, configurable number of partitions to count. Support to count the number of rows for the N biggest partitions in a table.

  • Counter tables are supported for load and unload.

  • Improve validation to include user-supplied queries and mappings.

  • The codec.timestamp CQL_DATE_TIME setting is renamed to CQL_TIMESTAMP. Adjust scripts to use the new setting.

  • Generated query does not contain all token ranges when a range wraps around the ring.

  • Empty map values do not work when loading using dsbulk.

  • DSBulk cannot handle columns of type list<timestamp>.

  • Generated queries do not respect indexed mapping order.

  • DSBulk fails to start with Java 10+.

1.0.2 (June 2018)
  • DataStax Bulk Loader 1.0.2 is bundled with DSE 6.0.1.

  • Configure whether to use ANSI colors and other escape sequences in log messages printed to standard output and standard error.

1.0.1 (April 2018)
  • DataStax Bulk Loader version 1.0.1 is automatically installed with DataStax Enterprise (DSE), and can also be installed as a standalone tool. DataStax Bulk Loader 1.0.1 is supported for use with DSE 5.0 and later.

  • Support to manage special characters on the command line and in the configuration file.

  • Improve error messages for incorrect mapping.

  • Improved monitoring options.

  • Detect console width on Windows.

  • Null words are supported by all connectors. The schema.nullStrings is changed to codec.nullWords. Renamed the convertTo and convertFrom methods. See Codec options and Schema options.

  • Use Logback to improve filtering to make stack traces more readable and useful. On ANSI-compatible terminals, the date prints in green, the hour in cyan, the level is blue (INFO) or red (WARN), and the message prints in black.

  • Improved messaging for completion with errors.

  • Settings schema.allowExtraFields and schema.allowMissingFields are added to reference.conf.

  • Support is dropped for using :port to specify the port to connect to. Specify the port for all hosts only with driver.port.

  • Numeric overflows should display the original input that caused the overflow.

  • Null words are not supported by all connectors.

  • Addresses might not be properly translated when the cluster has a custom native port.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2025 DataStax, an IBM Company | Privacy policy | Terms of use | Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com