Count options

Count options for the dsbulk command.

Specify options for the dsbulk count command. These options specify how counting will be accomplished by DataStax Bulk Loader.

Databases supported by DataStax Bulk Loader

DataStax Bulk Loader for Apache Cassandra® supports the use of the dsbulk load, dsbulk unload, and dsbulk count commands with:
  • Open source Apache Cassandra® 2.1 and later databases
  • DataStax Astra cloud databases
  • DataStax Enterprise (DSE) 4.7 and later databases
--stats.modes, --dsbulk.stats.modes { global | ranges | hosts | partitions }
Kind(s) of statistics to compute. Only applicable for count, ignored otherwise. Valid values are:
  • global: Count the total number of rows in the table.
  • ranges: Count the total number of rows per token range in the table.
  • hosts: Count the total number of rows per hosts in the table.
  • partitions: Count the total number of rows in the N biggest partitions in the table. Choose how many partitions to track with stats.numPartitions option. For partitions, the results are organized as follows:
    1. Left column: partition key value
    2. Middle column: number of rows using that partition key value
    3. Right column: the partition's percentage of rows compared to the total number of rows for this query
Note: When providing a custom query to dsbulk count, only the stats.mode global is accepted, starting in DataStax Bulk Loader 1.6.0. The query is executed "as is."

Default: global

--stats.numPartitions, --dsbulk.stats.numPartitions number
The number of distinct partitions for which to count rows. Only applicable for count, ignored otherwise.
Tip: When you limit the number of partitions, you're limiting the count of the top (biggest) number records, sorted by the number of partitions. That is, you're not limiting the first number of records in sequence.

Default: 10