Configure the chunk cache
The amount of native memory used by the Hyper-Converged Database (HCD) process is significant. The main reason for this is the chunk cache (or file cache), which is like an OS page cache. See Chunk cache differences from OS page cache to understand key differences between the chunk cache and the OS page cache.
Consider the following recommendations depending on workload type for your cluster.
Memory recommendations
Consider the following recommendations when choosing the max direct memory and file cache size:
-
Total server memory size
-
Adequate memory for the OS and other applications
-
Adequate memory for the Java heap size
-
Adequate memory for native raw memory (such as bloom filters and off-heap memtables)
For 64 GB servers, the default settings are typically adequate.
For larger servers,
increase the max direct memory
(-XX:MaxDirectMemorySize
), but leave approximately 15-20% of memory for the OS and other in-memory structures.
The chunk cache (file_cache_size_in_mb
) is set automatically to be half of the MaxDirectMemorySize
value.
This calculated file_cache_size_in_mb
default, though, might not result in optimal performance.
If the cache hit rate is too low and there is still available memory on the server, in your development environment, try increasing the file_cache_size_in_mb
value by setting it explicitly in cassandra.yaml
, up to 90% of the MaxDirectMemorySize
value.
Chunk cache differences from OS page cache
There are several differences between the chunk cache and the OS page cache, and a full description is outside the scope of this information. However, the following differences are relevant to HCD:
-
Because the OS page cache is sized dynamically by the operating system, it can grow and shrink depending on the available server memory. The chunk cache must be sized statically.
If the chunk cache is too small, the available server memory will be unused. For servers with large amounts of memory (50 GB or more), the memory is wasted. If the chunk cache is too large, the available memory on the server can reduce enough that the OS will kill the HCD process to avoid an out of memory issue.
The size of the chunk cache cannot be changed dynamically. To change the size of the chunk cache, you must restart the HCD process.
-
Restarting the HCD process will destroy the chunk cache, so each time the process is restarted, the chunk cache will be cold. The OS page cache only becomes cold after a server restart.
-
The memory used by the file cache is part of the HCD process memory, and is therefore seen by the OS as user memory. However, the OS page cache memory is seen as buffer memory.
-
The chunk cache uses mostly NIO direct memory, storing file chunks into NIO byte buffers. However, NIO does have an on-heap footprint, which DataStax is working to reduce.
Chunk cache history
The chunk cache is not new to Apache Cassandra®, and was originally intended to cache small parts (chunks) of SSTable files to make read operations faster. However, the default file access mode was memory mapped until DataStax Enterprise (DSE) 5.1, so the chunk cache had a secondary role and its size was limited to 512 MB.
The default setting of 512 MB was configured by the |
The chunk cache is a central component of the asynchronous thread-per-core (TPC) architecture.
By default, the chunk cache is configured to use the following portion of the max direct memory:
-
One-half (½) of the max direct memory for the HCD process
-
One-fourth (¼) of the max direct memory for tools
The max direct memory is calculated as one-half (½) of the system memory minus the JVM heap size:
Max direct memory = ((system memory - JVM heap size))/2
You can explicitly configure the max direct memory by setting the JVM MaxDirectMemorySize (-XX:MaxDirectMemorySize
) parameter.
See increasing the max direct memory.
Alternatively, you can override the max direct memory setting by explicitly configuring the file_cache_size_in_mb parameter in the cassandra.yaml
file.