Count data

Use the dsbulk count command to return information about the loaded data:

dsbulk count -k KEYSPACE_NAME -t TABLE_NAME OPTIONS

Replace the following:

Count with authentication

If you aren’t running the dsbulk count command against a local cluster that doesn’t require authentication, then you must provide authentication and connection details.

Pass the relevant options with your count commands if your cluster requires authentication or uses SSL encryption.

You can pass the values directly or use a configuration file.

  • DSE, HCD, and Cassandra

  • Astra DB

If your database requires authentication, provide the username and password using the -u and -p options, respectively:

dsbulk count -k ks1 -t table1 \
-u username -p password

If the cluster is remote, include driver options like host, contact points, and port:

dsbulk count -k ks1 -t table1 \
-u username -p password -h '10.200.1.3, 10.200.1.4' -port 9876

If your cluster has both authentication and SSL enabled, pass -u, -p, and the SSL options. For example:

dsbulk count -h '["fe80::f861:3eff:fe1d:9d7a"]' -u username -p password
      --driver.auth.provider DsePlainTextAuthProvider
      --driver.ssl.provider JDK
      --driver.ssl.keystore.path /etc/dse/keystores/client.keystore
      --driver.ssl.keystore.password sslkspassword
      --driver.ssl.truststore.path /etc/dse/keystores/client.truststore
      --driver.ssl.truststore.password ssltrustpassword
      -k ks1 -t table1
dsbulk count -k ks1 -t table1 \
-b "path/to/SCB.zip" -u token -p AstraCS:...

For Astra DB, the expected connection credentials are:

  • -b: Provide the path to the database’s Secure Connect Bundle (SCB) zip file. The SCB includes certificates and key files for SSL-encrypted connections as well as information about the database’s contact points.

  • -u: Set to the literal string token.

  • -p: Provide an application token. DataStax recommends using secure references to tokens, such as environment variables, rather than specifying them directly on the command line.

Count partition data

The following example gets information about the partition data used in a table named comments that is in a keyspace named cycling:

dsbulk count -k cycling -t comments --stats.modes partitions --stats.numPartitions 50

The console prints the log directory, performance metrics for the operation, and the data retrieved by the command. The retrieved data is presented in three columns:

  • The first column is the partition key value.

  • The second column is the number of rows using that partition key value.

  • The third column is the percentage of rows in the partition compared to the total number of rows that were scanned for the query.

Operation directory: /home/automaton/cycling/logs/COUNT_20190424-213840-954894
total | failed | rows/s | mb/s | kb/row | p50ms | p99ms | p999ms
   31 |      0 |     74 | 0.00 |   0.02 | 27.59 | 31.33 |  31.33
   Operation COUNT_20190424-213840-954894 completed successfully in 2 seconds.
   fb372533-eb95-4bb4-8685-6ef61e994caa 5 16.13
   8566eb59-07df-43b1-a21b-666a3c08c08a 4 12.90
   c7fceba0-c141-4207-9494-a29f9809de6f 4 12.90
   e7ae5cf3-d358-4d99-b900-85902fda9bb0 4 12.90
   6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 3 9.68
   9011d3be-d35c-4a8d-83f7-a3c543789ee7 2 6.45
   95addc4c-459e-4ed7-b4b5-472f19a67995 2 6.45
   38ab64b6-26cc-4de9-ab28-c257cf011659 2 6.45
   5b6962dd-3f90-4c93-8f61-eabfa4a803e2 1 3.23
   c4b65263-fe58-4846-83e8-f0e1c13d518f 1 3.23
   e7cd5752-bc0d-4157-a80f-7523add8dbcd 1 3.23
   6d5f1663-89c0-45fc-8cfd-60a373b01622 1 3.23
   220844bf-4860-49d6-9a4b-6b5d3a79cbfb 1 3.23

Count graph data

You can use the dsbulk count command to return information about loaded graph data. The command can return either vertices or edges:

dsbulk count -g GRAPH_NAME -v VERTEX_LABEL

The following example returns the number of person vertices that exist in the person vertex label table that is stored in the food graph:

dsbulk count -g food -v person

Similarly, you can retrieve the number of edges in an edge label table. The following example returns the number of authored edges that exist between person vertices and book vertices in the authored edge label table that are stored in the food graph:

dsbulk count -g food -e authored -from person -to book

Use dsbulk count as a replacement for SELECT COUNT(*)

The dsbulk count command can be more efficient than the SELECT COUNT(*) CQL query, particularly for large tables.

The dsbulk count command is optimized for counting rows and can provide better performance and lower resource consumption compared to executing a CQL query that selects all rows.

A query such as SELECT COUNT(*) FROM KEYSPACE_NAME.TABLE_NAME; can be replaced by the following dsbulk count command:

dsbulk count -k KEYSPACE_NAME -t TABLE_NAME

Was this helpful?

Give Feedback

How can we improve the documentation?

© Copyright IBM Corporation 2025 | Privacy policy | Terms of use Manage Privacy Choices

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: Contact IBM