Count options

Specify options for the dsbulk count command. These options specify how counting is accomplished by DataStax Bulk Loader.

Databases supported by DataStax Bulk Loader

DataStax Bulk Loader® supports the use of the dsbulk load, dsbulk unload, and dsbulk count commands with:

  • DataStax Astra DB

  • Hyper-Converged Database (HCD) 1.0 databases

  • DataStax Enterprise (DSE) 5.1, 6.8, and 6.9 databases

  • Open source Apache Cassandra® 2.1 and later databases

Count options follow:

--stats.modes, --dsbulk.stats.modes { global | ranges | hosts | partitions }

Kind(s) of statistics to compute. Only applicable for count, ignored otherwise. Valid values are:

  • global: Count the total number of rows in the table.

  • ranges: Count the total number of rows per token range in the table.

  • hosts: Count the total number of rows per hosts in the table.

  • partitions: Count the total number of rows in the N biggest partitions in the table. Choose how many partitions to track with stats.numPartitions option. For partitions, the results are organized as follows:

    1. Left column: partition key value

    2. Middle column: number of rows using that partition key value

    3. Right column: the partition’s percentage of rows compared to the total number of rows for this query

When providing a custom query to dsbulk count, only the --stats.modes global or --dsbulk.stats.modes global option is accepted, starting in DataStax Bulk Loader 1.6.0. The query is executed "as is."

Default: global

--stats.numPartitions, --dsbulk.stats.numPartitions number

The number of distinct partitions for which to count rows. Only applicable for count, ignored otherwise.

When you limit the number of partitions, you’re limiting the count of the top (biggest) number records, sorted by highest row count in a partition. That is, you’re not limiting the first number of records in sequence.

Default: 10

dsbulk count example

The following dsbulk count command lists the top 10 partitions, based on the row count in each of those partitions. The top partition in this example has 38 rows; the 2nd has 36 rows; the 3rd has 33 rows, and so on.

dsbulk-1.11/bin# dsbulk count --stats.modes partitions --stats.numPartitions 10
  -k myKeyspace -t myTable
Operation directory: /var/tmp/dsbulk-1.11/bin/logs/COUNT_20210311-222959-568583
total | failed | rows/s | p50ms | p99ms | p999ms
7,426,435,929 | 0 | 385,954 | 103.41 | 201.33 | 274.73

Operation COUNT_20201110-222959-568583 completed successfully in 0 hours, 10 minutes and 41 seconds.

48731809-4414-4e24-b5f1-cb6aac43529a 38 0.00
8de14cba-6c78-4b15-9551-f95cd318f03a 36 0.00
65a50cf1-1313-49c3-9c26-92c5db218b3a 33 0.00
9ee7245f-ad31-4e8f-bb37-ea5ec80f1fbd 33 0.00
dcc1f413-83fb-4ff1-bd49-38bf785fecdd 32 0.00
e50fc2f3-73b5-4011-a1e7-c2a19ea3bcdf 32 0.00
fa07e34c-c136-4413-b7cc-c1692c9a7ef3 31 0.00
eb8af55e-fb8d-4da3-8062-cab6a749b68c 31 0.00
7a0e4c8c-36c4-4ae6-99b1-c06d7940493a 31 0.00
9dea54b9-8726-4b62-af27-c5aa61d37bfd 30 0.00

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com