Thread pool and read/write latency statistics

Increases in pending tasks on thread pool statistics can indicate when to add additional capacity.

The DataStax Enterprise (DSE) database maintains distinct thread pools for different stages of execution. Each thread pool provides statistics on the number of tasks that are active, pending, delayed, completed, and blocked. Increases in the Pending tasks column indicate when to add additional capacity. After a baseline is established, configure alarms for any increases above normal in the pending tasks column.

Several options are available for viewing and configuring thread pool statistics:

The database tracks latency (averages and totals) of read, write, and slicing operations at the server level through StorageProxyMBean.

nodetool tpstats provides the following data:
BackgroundIoStage
Completes background tasks like submitting hints and deserializing the row cache.
CompactionExecutor
Running compaction.
GossipStage
Distributing node information via Gossip. Out of sync schemas can cause issues. You may have to sync using nodetool resetlocalschema.
HintsDispatcher
Dispatches a single hints file to a specified node in a batched manner.
InternalResponseStage
Responding to non-client initiated messages, including bootstrapping and schema checking.
MemtableFlushWriter
Writing memtable contents to disk. May back up if the queue is overruns the disk I/O, or because of sorting processes.
Warning: nodetool tpstats no longer reports blocked threads in the MemtableFlushWriter pool. Check the Pending Flushes metric reported by nodetool tblestats.
MemtablePostFlush
Cleaning up after flushing the memtable (discarding commit logs and secondary indexes as needed).
MemtableReclaimMemory
Making unused memory available.
PendingRangeCalculator
Calculating pending ranges per bootstraps and departed nodes. Reporting by this tool is not useful — see Developer notes.
PerDiskMemtableFlushWriter_N
Activity for the memtable flush writer of each disk.
ReadRepairStage
Performing read repairs. Usually fast, if there is good connectivity between replicas.