Tuning Java Virtual Machine
Improve performance or reduce high memory consumption by tuning the Java Virtual Machine (JVM).
- Bloom filters
- Partition summary
- Partition key cache
- Compression offsets
- SSTable index summary
The metadata resides in memory and is proportional to total data. Some of the components grow proportionally to the size of total memory. The database gathers replicas for a read or for anti-entropy repair and compares the replicas in heap memory.
Data written to the database is first stored in memtables in heap memory. Memtables are then flushed to SSTables on disk.
- Page cache. The database uses additional memory as page cache when reading files on disk.
- The Bloom filter and compression offset maps reside off-heap.
- The database can store cached rows in native memory, outside the Java heap. This reduces JVM heap requirements, which helps keep the heap size in the sweet spot for JVM garbage collection performance.
DSE advanced features memory use
DataStax Enterprise advanced features use additional memory on nodes where the workload type is enabled.
Search
- Solr stores indexed data in RAM buffer until it is flushed to index segments on disk; when setting the heap size determine the amount of memory required for Solr indexes. Allow enough free RAM, that is total RAM - DSE heap size - DSE off heap object size.
- Multiple concurrent indexers can cause GC thrashing, even with a large heap.
- Indexes larger than the page cache size can cause impact search query performance. Ensure that the index size does not exceed the page cache size for highly performant search queries.
Analytics
- Spark executors are the most memory intensive processes in Spark. These are tuned to use G1 GC by default. Tune the size of the executor heap in spark.defaults.conf. Consider leaving room for OS page cache when tuning the executor heaps.
- Common causes of Spark OOM's are shuffle steps. Try to avoid performing shuffles by leveraging RepartitionByCassandraReplica / JoinWithCassandraTable in your RDD Jobs .
Graph
DSE Graph workloads often include Search, Analytics, or both. Tune the GC for the Search and Analytics workloads. In addition to the memory needed by the Search and Analytics workloads, Graph queries utilize memory during execution. This workload is characterized by its short lived objects. Most DSE Graph deployments with Search enabled are run on systems of >= 128GB RAM with G1 GC heaps of 32 GB.