Tuning Java Virtual Machine

Improve performance or reduce high memory consumption by tuning the Java Virtual Machine (JVM). Operations on the following components occur in the JVM heap:

  • Bloom filters

  • Partition summary

  • Partition key cache

  • Compression offsets

  • SSTable index summary

The metadata resides in memory and is proportional to total data. Some of the components grow proportionally to the size of total memory. The database gathers replicas for a read or for anti-entropy repair and compares the replicas in heap memory.

Data written to the database is first stored in memtables in heap memory. Memtables are then flushed to SSTables on disk.

The database uses off-heap memory as follows:

  • Page cache. The database uses additional memory as page cache when reading files on disk.

  • The Bloom filter and compression offset maps reside off-heap.

  • The database can store cached rows in native memory, outside the Java heap. This reduces JVM heap requirements, which helps keep the heap size in the sweet spot for JVM garbage collection performance.

DSE advanced features memory use

DataStax Enterprise advanced features use additional memory on nodes where the workload type is enabled.

DSE Search has larger memory requirements than database only node. Most search deployments run with heaps between 24 -32 GB using G1 GC. Additional memory usage considerations include:

  • Solr stores indexed data in RAM buffer until it is flushed to index segments on disk; when setting the heap size determine the amount of memory required for Solr indexes. Allow enough free RAM, that is total RAM - DSE heap size - DSE off heap object size.

  • Multiple concurrent indexers can cause GC thrashing, even with a large heap.

  • Indexes larger than the page cache size can cause impact search query performance. Ensure that the index size does not exceed the page cache size for highly performant search queries.

Analytics

DSE Analytics nodes run Spark in a separate JVM. Therefore, adjustments to the Cassandra JVM do not affect Spark operations directly. DSE Analytics typically have read heavy workloads because analytic nodes run a significant number of range reading queries. Additional memory usage considerations include:

  • Spark executors are the most memory intensive processes in Spark. These are tuned to use G1 GC by default. Tune the size of the executor heap in spark.defaults.conf. Consider leaving room for OS page cache when tuning the executor heaps.

  • Common causes of Spark OOM’s are shuffle steps. Try to avoid performing shuffles by leveraging RepartitionByCassandraReplica / JoinWithCassandraTable in your RDD Jobs .

Graph

DSE Graph workloads often include Search, Analytics, or both. Tune the GC for the Search and Analytics workloads. In addition to the memory needed by the Search and Analytics workloads, Graph queries utilize memory during execution. This workload is characterized by its short lived objects. Most DSE Graph deployments with Search enabled are run on systems of >= 128GB RAM with G1 GC heaps of 32 GB.


Changing heap size parameters

Adjust the minimum, maximum and new generation heap sizes to tune JVM.

Configuring the garbage collector

Select and configure a garbage collector (GC) to remove data from memory that is no longer in use.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com