Monitoring options
Specify monitoring options for the dsbulk command.
Monitored throughput is often measured as operations per second, where an operation is a single write event or a single read event. But this unit of measurement can vary greatly, depending on the size of the row being written or read. DataStax recommends that when you monitor your data, consider using mb/sec as a different measure of throughput, to avoid the irregularity of measuring throughout as operations/sec. |
In a load work flow, a typical report shows:
2018-03-14 13:15:48 INFO Memory usage: used: 507 MB, free: 691 MB, allocated: 1,199 MB, available: 3,641 MB, total gc count: 20, total gc time: 346 ms 2020-06-14 13:15:48 INFO Records: total: 210,755, successful: 210,755, failed: 0, mean: 20,893 records/second 2020-06-14 13:15:48 INFO Batches: total: 6,602, size: 31.90 mean, 10 min, 32 max 2020-06-14 13:15:48 INFO Writes: total: 210,669, successful: 210,669, failed: 0, in-flight: 0 2020-06-14 13:15:48 INFO Throughput: 20,877 writes/second, 1.11 mb/second 2020-06-14 13:15:48 INFO Latencies: mean 6.29, 75p 2.87, 99p 89.13, 999p 125.83 milliseconds
The options can be used in short form (-k keyspace_name
) or long form (--schema.keyspace keyspace_name
).
--monitoring.console, --dsbulk.monitoring.console
Enable or disable console reporting. If enabled, DataStax Bulk Loader prints useful metrics about the ongoing operation to standard error. The metrics are refreshed at reportRate. Displayed information includes: total records, failed records, throughput, latency, and if available, average batch size.
When |
The default is true
— print ongoing metrics to the console.
Default: true
--monitoring.csv, --dsbulk.monitoring.csv { true | false }
Enable or disable CSV reporting. If enabled, CSV files containing metrics are generated in the designated log directory.
Default: false
--monitoring.durationUnit, --dsbulk.monitoring.durationUnit string
The time unit used when printing latency durations.
Valid values: all TimeUnit
enum constants.
Default: MILLISECONDS
--monitoring.expectedReads, --dsbulk.monitoring.expectedReads number
The expected total number of reads.
Optional, but if set, the console reporter also prints the overall achievement percentage.
Setting this value to -1
disables this feature.
Default: -1
--monitoring.expectedWrites, --dsbulk.monitoring.expectedWrites number
The expected total number of writes.
Optional, but if set, the console reporter also prints the overall achievement percentage.
Setting this value to -1
disables this feature.
Default: -1
-jmx,--monitoring.jmx, --dsbulk.monitoring.jmx { true | false }
Enable or disable JMX reporting.
To enable remote JMX reporting, several properties must also be set in the JVM during launch.
This is accomplished via the |
Default: true
--monitoring.rateUnit, --dsbulk.monitoring.rateUnit string
The time unit used when printing throughput rates. For example, if this unit is SECONDS, then the throughput will be displayed in rows per second. Valid values: all TimeUnit
enum constants.
Default: SECONDS
-reportRate, --monitoring.reportRate, --dsbulk.monitoring.reportRate string
The report interval for the console reporter. The console reporter prints useful metrics about the ongoing operation at this rate. Durations lesser than one second are rounded up to 1 second.
Default: 5 seconds
--monitoring.trackBytes, --dsbulk.monitoring.trackBytes boolean
Whether to track the throughput in bytes. When enabled, DSBulk tracks and displays the number of bytes sent or received per second.
While useful to evaluate how much data is actually being transferred, computing such metrics is CPU-intensive and may slow down the operation. This is why it is disabled by default. Also, the heuristic used to compute data sizes is not 100% accurate and sometimes underestimates the actual size. |
Default: false