Configuring data caches

Cassandra includes integrated caching and distributes cache data around the cluster. The integrated architecture facilitates troubleshooting and the cold start problem.

Cassandra includes integrated caching and distributes cache data around the cluster. When a node goes down, the client can read from another cached replica of the data. The integrated architecture also facilitates troubleshooting because there is no separate caching tier, and cached data matches what’s in the database exactly. The integrated cache solves the cold start problem by saving the cache to disk periodically. Cassandra reads contents back into the cache and distributes the data when it restarts. The cluster does not start with a cold cache.

In Cassandra 2.1, the saved key cache files include the ID of the table in the file name. A saved key cache file name for the users table in the mykeyspace keyspace in a Cassandra 2.1 looks something like this: mykeyspace-users.users_name_idx-19bd7f80352c11e4aa6a57448213f97f-KeyCache-b.db2046071785672832311.tmp

You can configure partial or full caching of each partition by setting the rows_per_partition table option. Previously, the caching mechanism put the entire partition in memory. If the partition was larger than the cache size, Cassandra never read the data from the cache. Now, you can specify the number of rows to cache per partition to increase cache hits. You configure the cache using the CQL caching property.

About the partition key cache 

The partition key cache is a cache of the partition index for a Cassandra table. Using the key cache instead of relying on the OS page cache decreases seek times. However, enabling just the key cache results in disk (or OS page cache) activity to actually read the requested data rows.

About the row cache 

You can configure the number of rows to cache in a partition. To cache rows, if the row key is not already in the cache, Cassandra reads the first portion of the partition, and puts the data in the cache. If the newly cached data does not include all cells configured by user, Cassandra performs another read.

Typically, you enable either the partition key or row cache for a table, except archive tables, which are infrequently read. Disable caching entirely for archive tables.