Reads are getting slower while writes are still fast
Too many SSTables can cause slow reads on Linux platforms.
Too many SSTables can cause slow reads. Take the following steps to determine and correct slow reads:
- Determine the total number of SSTables for each table.
- Check this number with nodetool tablestats.
- Get the number of SSTables consulted for each read.
- Check this number with nodetool tablehistograms. A median value over 2 or 3 is likely causing problems.
- Make sure SSTables are not flushing too frequently
- Check debug.log for enqueuing flush messages and note the
size and frequency of flushes:
- If the
SlabPoolCleaner
thread is frequently enqueueing many small flushes, increase memtable_cleanup_threshold.The value of memtable_cleanup_threshold should be inversely proportional to the number of tables that receive heavy writes.
- If spare memory is available, consider increasing memtable_heap_space_in_mb (deprecated) or memtable_offheap_space_in_mb (deprecated).
- If the
COMMIT-LOG-ALLOCATOR
thread is frequently enqueuing flushes, increase commitlog_total_space_in_mb.The number of flushes enqueued by
COMMIT-LOG-ALLOCATOR
should be a small minority of the total flushes enqueued. - Check the pending compactions using nodetool compactionstats.
Even if flushes are not excessively frequent, compactions might not be able to keep up. If a high number of pending compactions exist, compactions are not keeping up. Note that this number is only an estimate for LeveledCompactionStrategy.
- If the
- Make sure the compaction_throughput_mb_per_sec is set appropriately for your storage.
-
The default value of 16 MB/sec for compaction_throughput_mb_per_sec is chosen for spinning disks; SSDs can use a much higher setting such as 128 MB/sec or more.
- Temporarily adjust the value using nodetool setcompactionthroughput.
- Watch the I/O utilization using
iostat -x -t 10
, which shows the averages for 10 second intervals and prints timestamps:%iowait
over 1 indicates that the node is starting to get I/O bound.- The acceptable bounds for
await
(Average Wait in Milliseconds) are:- Most SSDs: below 10 ms.
- Most 7200 RPM spinning disks: below 200 ms.
- Once you have found a good throughput for your system, set it permanently in cassandra.yaml.
- If your I/O is not able to keep up with the necessary compaction throughput, you probably need to get faster disks or add more nodes.
- If you have set a high compaction throughput but I/O utilization is low and compactions are still not keeping up, the compactions may be CPU-bound.
-
Check the per-core CPU utilization of CompactionExecutor threads.
If the threads are utilizing 100% of a single core, the compaction may be CPU bound. Increasing concurrent_compactors will allow multiple concurrent compactions of different sets of SSTables, but compaction of each set of SSTables is inherently single-threaded. If you are using LeveledCompactionStrateg (LCS), you need to either switch to SizeTieredCompactionStrategy (STCS) or add more nodes to spread compaction load.
Note: Increasing concurrent compactors beyond the number of physical CPU cores (not Hyperthreaded cores) can be counter productive. Using all available CPU for compaction means no CPUs remain to handle reads and writes. If you need to continue to service requests while catching up on compactions, be sure to leave 1 or 2 physical CPUs free for reads and/or writes.
- Make sure that there is enough free memory for file cache (page cache).
-
The
free -m
command shows the amount of memory available for caches. You should have enough memory available for file cache to hold your hot working set in memory. - Switch to SizeTieredCompactionStrategy.
- If using LeveledCompactionStrategy (LCS) and the above steps haven't worked, consider switching to SizeTieredCompactionStrategy (STCS). LCS uses more resources to compact than STCS. Often nodes that are falling behind while compacting with LCS can easily keep up using STCS.