Nodes are dying with OOM errors
Nodes are dying with OutOfMemory exceptions on Linux platforms.
Nodes are dying with OutOfMemory exceptions.
Check for these typical causes:
- Row cache is too large, or is caching large rows
- Row cache is generally a high-end optimization. Try disabling it and see if the OOM problems continue.
- There is a large user query running on the node which takes up all the heap
- In production, understand and test all queries upfront to avoid arbitrary query patterns. Test to discover each query's max response size. Paging in CQL can often prevent a query from pulling too much data at once.
If none of these apply to your situation, try loading the heap dump in MAT and see which class is consuming the bulk of the heap for clues.