Column family performance metrics

Column family metrics allow you to drill down and locate specific areas of your application workloads that are the source of performance issues. If you notice a performance trend at the OS or cluster level, viewing column family metrics can provide a more granular level of detail.

The metrics for KeyCache Hits, RowCache Hits and SSTable Size can only be viewed on a single column family at a time. Otherwise, all column family metrics are available for specific column families as well as for all column families on a node.

In addition to monitoring read latency, write latency and load on a column family, you should also monitor the hit rates on the key and row caches for column families that rely on caching for performance. The more requests that are served from the cache, the better response times will be.

OpsCenter has been optimized to handle thousands of column families efficiently. If a column family experiences a dramatic dip in performance, check the Pending Tasks metric for a back-up in queued operations.

Viewing SSTable Size and SSTable Count for a specific column family (or counts for all families) can help with compaction tuning.

Column family local writes 

The write load on a column family measured in requests per second. This metric includes all writes to a given column family, including write requests forwarded from other nodes. This metric can be useful for tracking usage patterns of your application.

Column family local write latency 

The response time in milliseconds for successful write requests on a column family. The time period starts when nodes receive a write request, and ends when nodes respond. Optimal or acceptable levels of write latency vary widely according to your hardware, your network, and the nature of your write load. For example, the performance for a write load consisting largely of granular data at low consistency levels would be evaluated differently from a load of large strings written at high consistency levels.

Column family local reads 

The read load on a column family measured in requests per second. This metric includes all reads to a given column family, including read requests forwarded from other nodes. This metric can be useful for tracking usage patterns of your application.

Column family local read latency 

The response time in milliseconds for successful reads on a column family. The time period starts when a node receives a read request, and ends when the node responds. Optimal or acceptable levels of read latency vary widely according to your hardware, your network, and the nature of your application read patterns. For example, the use of secondary indexes, the size of the data being requested, and the consistency level required by the client can all impact read latency. An increase in read latency can signal I/O contention. Reads can slow down when rows are fragmented across many SSTables and compaction cannot keep up with the write load.

Column family key cache requests 

The total number of read requests on the row key cache.

Column family key cache hits 

The number of read requests that resulted in the requested row key being found in the key cache.

Column family key cache hit rate 

The percentage of cache requests that resulted in a cache hit that indicates the effectiveness of the key cache for a given column family. The key cache is used to find the exact location of a row on disk. If a row is not in the key cache, a read operation will populate the key cache after accessing the row on disk so subsequent reads of the row can benefit. Each hit on a key cache can save one disk seek per SSTable. If the hits line tracks close to the requests line, the column family is benefiting from caching. If the hits fall far below the request rate, this suggests that you could take actions to improve the performance benefit provided by the key cache, such as adjusting the number of keys cached.

Column family row cache requests 

The total number of read requests on the row cache. This metric is only meaningful for column families with row caching configured (it is not enabled by default).

Column family row cache hits 

The number of read requests that resulted in the read being satisfied from the row cache. This metric is only meaningful for column families with row caching configured (it is not enabled by default).

Column family row cache hit rate 

The percentage of cache requests that resulted in a cache hit that indicates the effectiveness of the row cache for a given column family. This metric is only meaningful for column families with row caching configured (it is not enabled by default). The graph tracks the number of read requests in relationship to the number of row cache hits. If the hits line tracks close to the requests line, the column family is benefiting from caching. If the hits fall far below the request rate, this suggests that you could take actions to improve the performance benefit provided by the row cache, such as adjusting the number of rows cached or modifying your data model to isolate high-demand rows.

Column family SSTable size 

The current size of the SSTables for a column family. It is expected that SSTable size will grow over time with your write load, as compaction processes continue doubling the size of SSTables. Using this metric together with SSTable count, you can monitor the current state of compaction for a given column family. Viewing these patterns can be helpful if you are considering reconfiguring compaction settings to mitigate I/O contention.

Column family SSTable count 

The current number of SSTables for a column family. When column family memtables are persisted to disk as SSTables, this metric increases to the configured maximum before the compaction cycle is repeated. Using this metric together with SSTable size, you can monitor the current state of compaction for a given column family. Viewing these patterns can be helpful if you are considering reconfiguring compaction settings to mitigate I/O contention.

Column family pending reads and writes 

The number of pending reads and writes on a column family. Pending operations are an indication that Cassandra is not keeping up with the workload. A value of zero indicates healthy throughput. If out-of-memory events become an issue in your Cassandra cluster, it may help to check cluster-wide pending tasks for operations that may be clogging throughput.

Bloom filters are used to avoid going to disk to try to read rows that don't actual exist.

Column family bloom filter space used 

The size of the bloom filter files on disk. This grows based on the number of rows in a column family and is tunable through the per-CF attribute,bloom_filter_fp_chance; increasing the value of this attribute shrinks the bloom filters at the expense of a higher number of false positives. Cassandra reads the bloom filter files and stores them on the heap, so large bloom filters can be expensive in terms of memory consumption.

Note: Bloom filters are used to avoid going to disk to try to read rows that don't actual exist.

Column family bloom filter false positives 

The number of false positives, which occur when the bloom filter said the row existed, but it actually did not exist in absolute numbers.

Column family bloom filter false positive ratio 

The fraction of all bloom filter checks resulting in a false positive. This should normally be at or below .01. A higher reading indicates that the bloom filter is likely too small.