Tuning Java Virtual Machine

Improve performance or reduce high memory consumption in DataStax Enterprise (DSE) by tuning the Java Virtual Machine (JVM). Operations on the following components occur in the JVM heap:

Bloom filters
Partition summary
Partition key cache
Compression offsets
SSTable index summary

The metadata resides in memory and is proportional to total data. Some of the components grow proportionally to the size of total memory. The database gathers replicas for a read or for anti-entropy repair and compares the replicas in heap memory.

Data written to the database is first stored in memtables in heap memory. Memtables are then flushed to SSTables on disk.

The database uses off-heap memory as follows:

Page cache. The database uses additional memory as page cache when reading files on disk.
The Bloom filter and compression offset maps reside off-heap.
The database can store cached rows in native memory, outside the Java heap. This reduces JVM heap requirements, which helps keep the heap size in the sweet spot for JVM garbage collection performance.

DSE advanced features memory use

DSE advanced features use additional memory on nodes where the workload type is enabled.

Search

DSE Search has larger memory requirements than database only node. Most search deployments run with heaps between 24 -32 GB using G1 GC. Additional memory usage considerations include:

Solr stores indexed data in RAM buffer until it is flushed to index segments on disk; when setting the heap size determine the amount of memory required for Solr indexes. Allow enough free RAM, that is total RAM - DSE heap size - DSE off heap object size.
Multiple concurrent indexers can cause GC thrashing, even with a large heap.
Indexes larger than the page cache size can cause impact search query performance. Ensure that the index size does not exceed the page cache size for highly performant search queries.

See DSE Search performance tuning and monitoring.

Analytics

DSE Analytics nodes run Apache Spark™ in a separate JVM. Therefore, adjustments to the Apache Cassandra JVM do not affect Spark operations directly. DSE Analytics typically have read heavy workloads because analytic nodes run a significant number of range reading queries. Additional memory usage considerations include:

Spark executors are the most memory intensive processes in Apache Spark. These are tuned to use G1 GC by default. Tune the size of the executor heap in spark.defaults.conf. Consider leaving room for OS page cache when tuning the executor heaps.
Common causes of Spark OOM’s are shuffle steps. Try to avoid performing shuffles by leveraging RepartitionByCassandraReplica / JoinWithCassandraTable in your RDD Jobs .

See Apache Spark JVMs and memory management.

Graph

DSE Graph workloads often include Search, Analytics, or both. Tune the GC for the Search and Analytics workloads. In addition to the memory needed by the Search and Analytics workloads, Graph queries utilize memory during execution. This workload is characterized by its short lived objects. Most DSE Graph deployments with Search enabled are run on systems of >= 128GB RAM with G1 GC heaps of 32 GB.

Tuning Java Virtual Machine

DSE advanced features memory use

Search

Analytics

Graph

See also

Was this helpful?

Give Feedback