Configuring data caches

DataStax Enterprise includes integrated caching and distributes cache data around the cluster.

On this page:

When a node goes down, the client can read from another cached replica of the data. The database architecture also facilitates troubleshooting because there is no separate caching tier, and cached data matches what is in the database exactly. The integrated cache alleviates the cold start problem by saving the cache to disk periodically. The database reads contents back into the cache and distributes the data when it restarts. The cluster does not start with a cold cache.

The saved key cache files include the ID of the table in the file name. A saved key cache filename for the users table in the mykeyspace keyspace looks similar to:

mykeyspace-users.users_name_idx-19bd7f80352c11e4aa6a57448213f97f-KeyCache-b.db2046071785672832311.tmp

About the partition key cache

The partition key cache is a cache of the partition index for a table. Using the key cache instead of relying on the OS page cache decreases seek times. Enabling just the key cache results in disk (or OS page cache) activity to actually read the requested data rows, but not enabling the key cache results in more reads from disk.

About the row cache

Utilizing appropriate OS page cache results in better performance than using row caching. Consult resources for page caching for the operating system on which DSE is hosted.

Configure the number of rows to cache in a partition by setting the rows_per_partition table option. To cache rows, if the row key is not already in the cache, the database reads the first portion of the partition, and puts the data in the cache. If the newly cached data does not include all cells configured by user, the database performs another read. The actual size of the row-cache depends on the workload. You should properly benchmark your application to get the best row cache size to configure.

There are two row cache options, the old serializing cache provider and a new off-heap cache (OHC) provider. The new OHC provider has been benchmarked as performing about 15% better than the older option.

Using key cache and row cache

Typically, you enable either the partition key or row cache for a table.

Enable a row cache only when the number of reads is much bigger (rule of thumb is 95%) than the number of writes. Consider using the operating system page cache instead of the row cache, because writes to a partition invalidate the whole partition in the cache.

Disable caching entirely for archive tables, which are infrequently read.

Enabling and configuring caching

Using CQL to enable or disable caching.

Tips for efficient cache use

Various tips for efficient cache use.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com