Column family performance metrics

Column family metrics allow drilling down and locating specific areas of application workloads that are the source of performance issues. If you notice a performance trend at the OS or cluster level, viewing column family metrics can provide a more granular level of detail.

The metrics for KeyCache Hits, RowCache Hits, and SSTable Size can only be viewed on a single column family at a time. Otherwise, all column family metrics are available for specific column families as well as for all column families on a node. In addition to monitoring read latency, write latency and load on a column family, you should also monitor the hit rates on the key and row caches for column families that rely on caching for performance. The more requests that are served from the cache, the faster the response times. Viewing SSTable Size and SSTable Count for a specific column family (or counts for all families) can help with compaction tuning.

OpsCenter has been optimized to efficiently handle thousands of column families. If a column family experiences a dramatic dip in performance, check the Pending Tasks metrics for a backup in queued operations.

Column Family metrics are prefaced with CF.

CF: Local Writes: The write load on a column family measured in requests per second. This metric includes all writes to a given column family, including write requests forwarded from other nodes. This metric can be useful for tracking usage patterns of an application.

CF: Local Write Latency: The response time in milliseconds for successful write requests on a column family. The time period starts when nodes receive a write request, and ends when nodes respond. Optimal or acceptable levels of write latency vary widely according to your hardware, your network, and the nature of your write load. For example, the performance for a write load consisting largely of granular data at low consistency levels would be evaluated differently from a load of large strings written at high consistency levels.

CF: Write Latency (Stacked): The min, median, max, 90th, and 99th percentile of the response times to write data to a table's memtable and append to the commitlog. The elapsed time from when the replica receives the request from a coordinator and returns a response.

CF: Local Reads: The read load on a column family measured in requests per second. This metric includes all reads to a given column family, including read requests forwarded from other nodes. This metric can be useful for tracking usage patterns of your application.

CF: Local Read Latency: The response time in milliseconds for successful reads on a column family. The time period starts when a node receives a read request, and ends when the node responds. Optimal or acceptable levels of read latency vary widely according to your hardware, your network, and the nature of your application read patterns. For example, the use of secondary indexes, the size of the data being requested, and the consistency level required by the client can all impact read latency. An increase in read latency can signal I/O contention. Reads can slow down when rows are fragmented across many SSTables and compaction cannot keep up with the write load.

Read Latency (Stacked): The min, median, max, 90th, and 99th percentiles of a client reads. The time period starts when a node receives a client read request, and ends when the node responds back to the client. Depending on consistency level and replication factor, this may include the network latency from requesting the data’s replicas.

CF: Live Disk Used: Disk space used by live SSTables. There might be obsolete SSTables not included.

CF: Total Disk Used: Disk space used by a table by SSTables, including obsolete ones waiting to be garbage collected.

CF: SSTable Count: The current number of SSTables for a column family. When column family memtables are persisted to disk as SSTables, this metric increases to the configured maximum before the compaction cycle is repeated. Using this metric together with SSTable size, you can monitor the current state of compaction for a given column family. Viewing these patterns can be helpful if you are considering reconfiguring compaction settings to mitigate I/O contention.

CF: SSTables per Read (Stacked): The min, median, max, 90th, and 99th percentile of how many SSTables are accessed during a read.

CF: Pending Reads/Writes: The number of pending reads and writes on a column family. Pending operations are an indication that Cassandra is not keeping up with the workload. A value of zero indicates healthy throughput. If out-of-memory events become an issue in your Cassandra cluster, it might help to check cluster-wide pending tasks for operations that could be clogging throughput.

CF: Bloom Filter Space Used: The size of the bloom filter files on disk. This grows based on the number of rows in a column family and is tunable through the per-CF attribute, bloom_filter_fp_chance; increasing the value of this attribute shrinks the bloom filters at the expense of a higher number of false positives. Cassandra reads the bloom filter files and stores them on the heap, so large bloom filters can be expensive in terms of memory consumption.

Note: Bloom filters are used to avoid going to disk to try to read rows that don't actually exist.

CF: Bloom Filter False Positives: The number of false positives, which occur when the bloom filter said the row existed, but it actually did not exist in absolute numbers.

CF: Bloom Filter False Positive Ratio: Percentage of bloom filter lookups that resulted in a false positive.

Note: The Bloom Filter False Positive Ratio should normally be at or below .01. A higher reading indicates that the bloom filter is likely too small.

CF: Bloom Filter Off Heap: Total off heap memory used by bloom filters from all live SSTables in a table.

CF: Index Summary Off Heap: Total off heap memory used by the index summary of all live SSTables in a table.

CF: Compression Metadata Off Heap: Total off heap memory used by the compression metadata of all live SSTables in a table.

CF: Memtable Off Heap: Off heap memory used by a table's current memtable.

CF: Total Memtable Size: An estimate of the space used in memory (including JVM overhead) for all memtables. This includes ones that are currently being flushed and related secondary indexes.

CF: Key Cache Requests: The total number of read requests on the row key cache.

CF: Key Cache Hits: The number of read requests that resulted in the requested row key being found in the key cache.

CF: Key Cache Hit Rate: The percentage of cache requests that resulted in a cache hit that indicates the effectiveness of the key cache for a given column family. The key cache is used to find the exact location of a row on disk. If a row is not in the key cache, a read operation will populate the key cache after accessing the row on disk so subsequent reads of the row can benefit. Each hit on a key cache can save one disk seek per SSTable. If the hits line tracks close to the requests line, the column family is benefiting from caching. If the hits fall far below the request rate, this suggests that you could take actions to improve the performance benefit provided by the key cache, such as adjusting the number of keys cached.

CF: Row Cache Requests: The total number of read requests on the row cache. This metric is only meaningful for column families with row caching configured (row caching is not enabled by default).

CF: Row Cache Hits: The number of read requests that resulted in the read being satisfied from the row cache. This metric is only meaningful for column families with row caching configured (row caching is not enabled by default).

CF: Row Cache Hit Rate: The percentage of cache requests that resulted in a cache hit that indicates the effectiveness of the row cache for a given column family. This metric is only meaningful for column families with row caching configured (row caching is not enabled by default). The graph tracks the number of read requests in relationship to the number of row cache hits. If the hits line tracks close to the requests line, the column family is benefiting from caching. If the hits fall far below the request rate, this suggests that you could take actions to improve the performance benefit provided by the row cache, such as adjusting the number of rows cached or modifying your data model to isolate high-demand rows.

CF: SSTable Size: The current size of the SSTables for a column family. It is expected that SSTable size will grow over time with your write load, as compaction processes continue doubling the size of SSTables. Using this metric together with SSTable count, you can monitor the current state of compaction for a given column family. Viewing these patterns can be helpful if you are considering reconfiguring compaction settings to mitigate I/O contention