Tips for efficient cache use

Various tips for efficient cache use.

"Tuning the row cache in Cassandra 2.1" describes best practices of using the built-in caching mechanisms and designing an effective data model. Some tips for efficient cache use are:

Store lower-demand data or data with extremely long partitions in a table with minimal or no caching.
Deploy a large number of Cassandra nodes under a relatively light load per node.
Logically separate heavily-read data into discrete tables.

When you query a table, turn on tracing to check that the table actually gets data from the cache rather than from disk. The first time you read data from a partition, the trace shows this line below the query becasue the cache has not been populated yet:

Row cache miss [ReadStage:41]

In subsequent queries for the same partition, look for a line in the trace that looks something like this:

Row cache hit [ReadStage:55]

This output means the data was found in the cache and no disk read occurred. Updates invalidate the cache. If you query rows in the cache plus uncached rows, request more rows than the global limit allows, or the query does not grab the beginning of the partition, the trace might include a line that looks something like this:

Ignoring row cache as cached value could not satisfy query [ReadStage:89]

This output indicates that an insufficient cache caused a disk read. Requesting rows not at the beginning of the partition is a likely cause. Try removing constraints that might cause the query to skip the beginning of the partition, or place a limit on the query to prevent results from overflowing the cache. To ensure that the query hits the cache, try increasing the cache size limit, or restructure the table to position frequently accessed rows at the head of the partition.