Tuning Java Virtual Machine

Improve performance or reduce high memory consumption by tuning the Java Virtual Machine (JVM).

Improve performance or reduce high memory consumption by tuning the Java Virtual Machine (JVM). Operations on the following components occur in the JVM heap:
  • Bloom filters
  • Partition summary
  • Partition key cache
  • Compression offsets
  • SSTable index summary

The metadata resides in memory and is proportional to total data. Some of the components grow proportionally to the size of total memory. The database gathers replicas for a read or for anti-entropy repair and compares the replicas in heap memory.

Data written to the database is first stored in memtables in heap memory. Memtables are then flushed to SSTables on disk.

Note: The database uses off-heap memory as follows:
  • Page cache. The database uses additional memory as page cache when reading files on disk.
  • The Bloom filter and compression offset maps reside off-heap.
  • The database can store cached rows in native memory, outside the Java heap. This reduces JVM heap requirements, which helps keep the heap size in the sweet spot for JVM garbage collection performance.

DSE advanced features memory use

DataStax Enterprise advanced features use additional memory on nodes where the workload type is enabled.

Analytics

DSE Analytics nodes run Spark in a separate JVM. Therefore, adjustments to the Cassandra JVM do not affect Spark operations directly. DSE Analytics typically have read heavy workloads because analytic nodes run a significant number of range reading queries. Additional memory usage considerations include:
  • Spark executors are the most memory intensive processes in Spark. These are tuned to use G1 GC by default. Tune the size of the executor heap in spark.defaults.conf. Consider leaving room for OS page cache when tuning the executor heaps.
  • Common causes of Spark OOM's are shuffle steps. Try to avoid performing shuffles by leveraging RepartitionByCassandraReplica / JoinWithCassandraTable in your RDD Jobs .

Graph

DSE Graph workloads often include Search, Analytics, or both. Tune the GC for the Search and Analytics workloads. In addition to the memory needed by the Search and Analytics workloads, Graph queries utilize memory during execution. This workload is characterized by its short lived objects. Most DSE Graph deployments with Search enabled are run on systems of >= 128GB RAM with G1 GC heaps of 32 GB.