Monitoring options

Specify monitoring options for the dsbulk command.

Monitored throughput is often measured as operations per second, where an operation is a single write event or a single read event. But this unit of measurement can vary greatly, depending on the size of the row being written or read. DataStax recommends that when you monitor your data, consider using mb/sec as a different measure of throughput, to avoid the irregularity of measuring throughout as operations/sec.

In a load work flow, a typical report shows:

2018-03-14 13:15:48 INFO Memory usage: used: 507 MB, free: 691 MB, allocated: 1,199 MB, available: 3,641 MB,
    total gc count: 20, total gc time: 346 ms
2020-06-14 13:15:48 INFO Records: total: 210,755, successful: 210,755, failed: 0, mean: 20,893 records/second
2020-06-14 13:15:48 INFO Batches: total: 6,602, size: 31.90 mean, 10 min, 32 max
2020-06-14 13:15:48 INFO Writes: total: 210,669, successful: 210,669, failed: 0, in-flight: 0
2020-06-14 13:15:48 INFO Throughput: 20,877 writes/second, 1.11 mb/second
2020-06-14 13:15:48 INFO Latencies: mean 6.29, 75p 2.87, 99p 89.13, 999p 125.83 milliseconds

The options can be used in short form (-k keyspace_name) or long form (--schema.keyspace keyspace_name).

--monitoring.console, --dsbulk.monitoring.console

Enable or disable console reporting. If enabled, DataStax Bulk Loader prints useful metrics about the ongoing operation to standard error. The metrics are refreshed at reportRate. Displayed information includes: total records, failed records, throughput, latency, and if available, average batch size.

When log.verbosity is set to quiet (0), DataStax Bulk Loader disables the console reporter regardless of the value specified here.

The default is true — print ongoing metrics to the console.

Default: true

--monitoring.csv, --dsbulk.monitoring.csv { true | false }

Enable or disable CSV reporting. If enabled, CSV files containing metrics are generated in the designated log directory.

Default: false

--monitoring.durationUnit, --dsbulk.monitoring.durationUnit string

The time unit used when printing latency durations. Valid values: all TimeUnit enum constants.

Default: MILLISECONDS

--monitoring.expectedReads, --dsbulk.monitoring.expectedReads number

The expected total number of reads. Optional, but if set, the console reporter also prints the overall achievement percentage. Setting this value to -1 disables this feature.

Default: -1

--monitoring.expectedWrites, --dsbulk.monitoring.expectedWrites number

The expected total number of writes. Optional, but if set, the console reporter also prints the overall achievement percentage. Setting this value to -1 disables this feature.

Default: -1

-jmx,--monitoring.jmx, --dsbulk.monitoring.jmx { true | false }

Enable or disable JMX reporting.

To enable remote JMX reporting, several properties must also be set in the JVM during launch. This is accomplished via the DSBULK_JAVA_OPTS environment variable.

Default: true

--monitoring.rateUnit, --dsbulk.monitoring.rateUnit string

The time unit used when printing throughput rates. For example, if this unit is SECONDS, then the throughput will be displayed in rows per second. Valid values: all TimeUnit enum constants.

Default: SECONDS

-reportRate, --monitoring.reportRate, --dsbulk.monitoring.reportRate string

The report interval for the console reporter. The console reporter prints useful metrics about the ongoing operation at this rate. Durations lesser than one second are rounded up to 1 second.

Default: 5 seconds

--monitoring.trackBytes, --dsbulk.monitoring.trackBytes boolean

Whether to track the throughput in bytes. When enabled, DSBulk tracks and displays the number of bytes sent or received per second.

While useful to evaluate how much data is actually being transferred, computing such metrics is CPU-intensive and may slow down the operation. This is why it is disabled by default. Also, the heuristic used to compute data sizes is not 100% accurate and sometimes underestimates the actual size.

Default: false

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com