Retrieving Metric Data¶
Using the metric retrieval methods you can retrieve performance metrics at the cluster, node, and table levels.
Additionally, you have the ability to use existing metric data to create a forecast of future data points for a specific metric. More information on forecasting is available here
Metric Retrieval Methods | URL |
---|---|
Retrieve cluster-wide metrics. | GET /{cluster_id}/cluster-metrics/{dc}/{metric} |
Retrieve cluster-wide metrics about a device. | GET /{cluster_id}/cluster-metrics/{dc}/{metric}/{device} |
Retrieve cluster-wide metrics about a table. | GET /{cluster_id}/cluster-metrics/{dc}/{ks_name}/{cf_name}/{metric} |
Retrieve metrics about a node. | GET /{cluster_id}/metrics/{node_ip}/{metric} |
Retrieve node-specific metrics about a device. | GET /{cluster_id}/metrics/{node_ip}/{metric}/{device} |
Retrieve node-specific metrics about a table. | GET /{cluster_id}/metrics/{node_ip}/{ks_name}/{cf_name}/{metric} |
Retrieve a forecast for a cluster-wide metric. | GET /{cluster_id}/cluster-metrics/{dc}/{metric} |
New way to retrieve all types of metrics. | GET /{cluster_id}/new-metrics |
You can choose from a large number of metric keys to pass with these methods, making retrieval of a wide spectrum of performance information possible.
Controlling the Metric Data Output¶
You can also use the following query parameters with these methods to control the output:
Query Parameter | Description |
---|---|
start | (optional) A timestamp in seconds indicating the beginning of
the time range to fetch. When omitted, this defaults to one
day before the end parameter. |
end | (optional) A timestamp in seconds indicating the end of the time range to fetch. When omitted, this defaults to the current time. |
step | (optional) The resolution of the data points for the metric. Valid input options are: 1, 5, 120, or 1440 minutes; corresponding output intervals are 60, 300, 7200, or 86400 seconds. The default is a 1 minute step. |
step | (optional; new-metrics API) The new metrics api requires that the step argument be specified in seconds rather than minutes. This is to stay consistent with the return format of the new api. Valid inputs in this case are: 60, 300, 7200, and 86400. |
function | (optional) The type of aggregation to perform on the metric: min, max, or average. By default, results are returned for all three types of aggregation. |
forecast | (optional) A boolean flag indicating that we would like to generate a forecast for the time range and step specified. This will use past data to calculate projected data points in the time range specified. |
Results of calls to retrieve metrics are returned in the following format:
{
[<node_ip>: | <device>: | <keyspace.columnfamily>:]
{
<function>:
[
[<timestamp> <value>],
...
]
}
}
By default, the output is metric data points at 60-second intervals over a 24-hour period. Data points are listed in chronological order, starting with the oldest data point first.
- GET /{cluster_id}/cluster-metrics/{dc}/{metric}¶
Aggregate a metric across multiple nodes in the cluster rather than retrieving data about a single node.
Path arguments: - cluster_id – A Cluster Config ID.
- dc – The name of the data center for the nodes. Use the name
all
to aggregate a metric across all data centers. - metric – One of the Cluster Metrics Keys.
Query params: parameters – The parameters listed in Controlling the Metric Data Output.
Returns metric data across multiple nodes in a cluster.
Example
Get the average write requests per second over to the cluster over all data centers on May 1, 2012 from 8 AM to 5 PM GMT. Show data points at 2-hour (120-minute) intervals.
Using curl’s -G flag to build an HTTP GET request:
curl -G http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/write-ops -d 'step=120' -d 'start=1335859200' -d 'end=1335891600' -d 'function=average'
Manually building an HTTP GET request:
curl 'http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/write-ops?step=120&start=1335859200&end=1335891600&function=average'
Output:
Data points at 2-hour (7200 seconds) intervals show the number of write requests per second during business hours on May 1.
{ "Total": { "AVERAGE": [ [ 1335859200, null ], [ 1335866400, 13.376885890960693 ], [ 1335873600, 13.372154712677002 ], [ 1335880800, 13.365732669830322 ], [ 1335888000, 13.392115592956543 ] ] } }
- GET /{cluster_id}/cluster-metrics/{dc}/{metric}/{device}¶
Aggregate a disk or network metric, which pertains to a specific device, across multiple nodes in the cluster rather than retrieving data about a single node.
Path arguments: - cluster_id – A Cluster Config ID.
- dc – The name of the data center for the nodes. Use the name
all
to aggregate a metric across all data centers. - metric – One of the Cluster Metrics Keys or Operating System Metrics Keys.
- device – The device to be measured, which the Node
object lists. Use the name
all
to measure all devices, For example, when requesting a disk metric,all
will aggregate metrics from all disk devices.
Query params: parameters – The parameters listed in Controlling the Metric Data Output.
Examples of Device Arguments
To determine the set of network interfaces that metrics are available for, you can run a query similar to the following:
curl http://localhost:8888/Test_Cluster/nodes/192.168.1.1/network_interfaces
["lo0", "eth0", "eth1"]
In this case,
lo0
,eth0
, andeth1
can all be used.Disk devices can be discovered in a similar way.
curl http://localhost:8888/Test_Cluster/nodes/192.168.1.1/devices
{ "commitlog": "sdb", "data": ["sda"], "saved_caches": "sda", "other": ["sdc"] }
In this case, any of
sda
,sdb
, orsdc
may be used.Finally, metrics are also captured for disk partitions and filesystems:
curl http://localhost:8888/Test_Cluster/nodes/192.168.1.1/partitions
{ "commitlog": "/dev/sdb1", "data": ["/dev/sda1"], "saved_caches": "/dev/sda1", "other": ["/dev/sdc1"] }
Here, the available partitions are
/dev/sda1
,/dev/sdb1
, and/dev/sdc1
. Keep in mind that you will need to URL-encode the items, so/dev/sda1
will become%2Fdev%2Fsda1
.Using a partition, network interface, or other device name for the device argument returns disk or network metric data about a specific device across multiple nodes. Using
all
for the device name returns a dictionary of keys (device names) and the values (results for that device).Example
Get the average GB of space on all disks in all data centers used each day by the cluster from April 11, 2012 00:00:00 to April 26, 2012 00:00:00 GMT.
Using curl’s -G flag to build an HTTP GET request:
curl -G http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/os-disk-used/all -d 'step=1440' -d 'start=1334102400' -d 'end=1335398400' -d 'function=average'
Manually building an HTTP GET request:
curl 'http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/os-disk-used/all?step=1440&start=1334102400&end=1335398400&function=average'
Output:
{ "Total": { "AVERAGE": [ [ 1334102400, null ], [ 1334188800, 21.000694274902344 ], [ 1334275200, 8.736943244934082 ], [ 1334361600, 9.0 ], [ 1334448000, 19.0 ], [ 1334534400, 19.0 ], [ 1334620800, 19.0 ], [ 1334707200, 19.0 ], [ 1334793600, 18.629029273986816 ], [ 1334880000, 19.923184394836426 ], [ 1334966400, 25.0 ], [ 1335052800, 25.0 ], [ 1335139200, 25.923053741455078 ], [ 1335225600, 26.0 ], [ 1335312000, 26.549484252929688 ] ] } }
- GET /{cluster_id}/cluster-metrics/{dc}/{ks_name}/{cf_name}/{metric}¶
Aggregate a table metric across multiple nodes in the cluster rather than retrieving data about a single node.
Path arguments: - cluster_id – A Cluster Config ID.
- dc – The name of the data center for the nodes. Use the name
all
to aggregate a metric across all data centers. - ks_name – The keyspace that contains the table to be measured.
- cf_name – The table to be measured.
- metric – One of the Table Metrics Keys.
Query params: parameters – The parameters listed in Controlling the Metric Data Output.
Returns metric data for multiple nodes.
Example
Get the maximum bytes of disk space used for live data by the Users table in the Keyspace1 keyspace of the cluster over all data centers from May 1, 2012 00:00:00 to May 5, 2012 00:00:00 GMT:
Using curl’s -G flag to build an HTTP GET request:
curl -G http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/Keyspace1/Users/cf-live-disk-used -d 'function=max' -d 'start=1335830400' -d 'end=1336176000' -d 'step=1440'
Manually building an HTTP GET request:
curl 'http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/Keyspace1/Users/cf-live-disk-used?function=max&start=1335830400&end=1336176000&step=1440'
Output:
Data points at 24-hour intervals show the metrics for the period.
{ "Total": { "MAX": [ [ 1335830400, 9740462592.0 ], [ 1335916800, 9932527616.0 ], [ 1336003200, null ], [ 1336089600, 10644448512.0 ] ] } }
- GET /{cluster_id}/metrics/{node_ip}/{metric}¶
Retrieve metric data for a single node.
Path arguments: - cluster_id – A Cluster Config ID.
- node_ip – IP address of the target Node.
- metric – One of the Cluster Metrics Keys.
Query params: parameters – The parameters listed in Controlling the Metric Data Output.
Returns metric data for a single node.
Example
Get the daily average data load on cluster node 10.11.12.150 from April 20, 2012 00:00:00 to April 26, 2012 00:00:00 GMT:
Using curl’s -G flag to build an HTTP GET request:
curl -G http://127.0.0.1:8888/Test_Cluster/metrics/10.11.12.150/data-load -d 'step=1440' -d 'start=1334880000' -d 'end=1335398400' -d 'function=average'
Manually building an HTTP GET request:
curl 'http://127.0.0.1:8888/Test_Cluster/metrics/10.11.12.150/data-load?step=1440&start=1334880000&end=1335398400&function=average'
Output:
{ "10.11.12.150": { "AVERAGE": [ [ 1334880000, null ], [ 1334966400, 6353770496.0 ], [ 1335052800, 6560092672.0 ], [ 1335139200, 6019291136.0 ], [ 1335225600, 6149050880.0 ], [ 1335312000, 6271239680.0 ] ] } }
- GET /{cluster_id}/metrics/{node_ip}/{metric}/{device}¶
Aggregate a disk or network metric for a single node.
Path arguments: - cluster_id – A Cluster Config ID.
- node_ip – IP address of the target Node.
- metric – One of the Cluster Metrics Keys or Operating System Metrics Keys.
- device – The device to be measured. Use the name
all
to measure all devices associated with a disk metric. SeeGET /{cluster_id}/cluster-metrics/{dc}/{metric}/{device}
for examples of devices.
Query params: parameters – The parameters listed in Controlling the Metric Data Output.
Returns disk or network metrics data for a single node.
Example
Get the maximum GB of disk space for all disks used by cluster node 10.11.12.150 from April 30, 2012 at 22:05 to May 1, 2012 8:00:00 GMT:
Using curl’s -G flag to build an HTTP GET request:
curl -G http://127.0.0.1:8888/Test_Cluster/metrics/10.11.12.150/os-disk-used/all -d 'start=1335823500' -d 'end=1335859200' -d 'step=120' -d 'function=max'
Manually building an HTTP GET request:
curl 'http://127.0.0.1:8888/Test_Cluster/metrics/10.11.12.150/os-disk-used/all?start=1335823500&end=1335859200&step=120&function=max'
Output:
Data points at 2-minute intervals show the disk space used by device /dev/sda1.
{ "/dev/sda1": { "MAX": [ [ 1335823200, null ], [ 1335830400, 17.0 ], [ 1335837600, 16.0 ], [ 1335844800, 17.0 ], [ 1335852000, 16.0 ] ] } }
- GET /{cluster_id}/metrics/{node_ip}/{ks_name}/{cf_name}/{metric}¶
Retrieve metric data about a table on a single node.
Path arguments: - cluster_id – A Cluster Config ID.
- node_ip – IP address of the target Node.
- ks_name – The keyspace that contains the table to be measured.
- cf_name – The table to be measured.
- metric – One of the Table Metrics Keys.
Query params: parameters – The parameters listed in Controlling the Metric Data Output.
Example
Get the daily, maximum response time (in microseconds) to write requests on the Users table in the Keyspace1 keyspace by cluster node 10.11.12.150 from May 1, 2012 at 00:00:00 to May 5, 2012 00:00:00 GMT.
Using curl’s -G flag to build an HTTP GET request:
curl -G http://127.0.0.1:8888/Test_Cluster/metrics/10.11.12.150/Keyspace1/Users/cf-write-latency-op -d 'function=max' -d 'start=1335830400' -d 'end=1336176000' -d 'step=1440'
Manually building an HTTP GET request:
curl 'http://127.0.0.1:8888/Test_Cluster/metrics/10.11.12.150/Keyspace1/Users/cf-write-latency-op?function=max&start=1335830400&end=1336176000&step=1440'
Output:
{ "OpsCenter.rollups60": { "MAX": [ [ 1335830400, 102.28681945800781 ], [ 1335916800, 124.86614227294922 ], [ 1336003200, null ], [ 1336089600, 127.14733123779297 ] ] } }
- GET /{cluster_id}/cluster-metrics/{dc}/{metric}¶
Generate a forecast for a metric aggregated across the cluster
Path arguments: - cluster_id – A Cluster Config ID.
- dc – The name of the data center for the nodes. Use the name
all
to aggregate a metric across all data centers. - metric – One of the Cluster Metrics Keys.
Query params: parameters – The parameters listed in Controlling the Metric Data Output.
Example
Forecast the average write requests per second over for the cluster over all data centers starting from the current time to 4 weeks in the future. Show data points at 1 day (1440-minute) intervals.
Using curl’s -G flag to build an HTTP GET request:
curl -G http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/write-ops -d "step=1440" -d "start=`date +'%s'`" -d "end=`date -v+4w +'%s'`" -d "forecast=1"
Manually building an HTTP GET request:
curl "http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/write-ops?step=1440&start=`date +'%s'`&end=`date -v+4w +'%s'`&forecast=1"
Output:
Data points at 1 day (86400 seconds) intervals show the forecasted number of write requests per day for the next 4 weeks. The results will include the data used to generate the forecast. In this example the forecast is based on 12 weeks of data, so the results begin 12 weeks in the past.
{ "Total": { "AVERAGE": [ [ 1376006400, 172.18471918718131 ], [ 1376092800, 182.06741811718813 ], [ 1376179200, 159.14967219176917 ], ... [ 1385769600, 202.93040370941162 ], [ 1385856000, 202.78100836277008 ], [ 1385942400, 202.59301888942719 ] ] }
- GET /{cluster_id}/new-metrics¶
Retrieve metric data for a single node.
Path arguments: cluster_id – A Cluster Config ID.
Query params: - nodes – A comma separated list of nodes to fetch data for. Either this or node_group must be specified.
- node_group – A convenient way of specifying a group of nodes to retrieve data for. Can be ‘*’ for all nodes, or the name of a datacenter for the nodes in that datacenter. Either this or nodes must be specified.
- metrics – A comma separated list of the Metrics Attribute Key Lists to fetch data for. When fetching multiple metrics, all metrics will be fetched using the same nodes, start, end, etc parameters.
- columnfamilies – A comma separated list of ‘<keyspace>.columnfamily’ strings indicating the tables to fetch the given metrics for. Required when fetching metrics that are specific to a certain table.
- tiers – A comma separated list of storage tier numbers indicating the tiers to fetch the given metrics for. Required when fetching metrics are specific to a certain storage tier
- devices – A comma separated list of device strings indicating the devices to fetch the given metrics for. Required when fetching metrics that are specific to a certain disk or network device.
- node_aggregation – Indicates whether or not to aggregate the results across nodes. A ‘0’ value indicates false and a ‘1’ value indicates true.
- parameters (additional) – The parameters listed in Controlling the Metric Data Output.
Returns metric data.
Examples
Get the daily average data load on cluster nodes 10.11.12.150, 10.11.12.151 from April 20, 2012 00:00:00 to April 26, 2012 00:00:00 GMT:
Using curl’s -G flag to build an HTTP GET request:
curl -G http://127.0.0.1:8888/Test_Cluster/new-metrics -d 'metrics=data-load' -d 'nodes=10.11.12.150,10.11.12.151' -d 'step=86400' -d 'start=1334880000' -d 'end=1335398400'
Manually building an HTTP GET request:
curl 'http://127.0.0.1:8888/Test_Cluster/new-metrics?metrics=data-load&nodes=10.11.12.150,10.11.12.151&step=86400&start=1334880000&end=1335398400'
Output:
{ "metrics": ["data-load"], "bounds": {"start": 1334880000, "end": 1335312000, "step": 86400}, "aggregation_function": null, "nodes": ["10.11.12.150", "10.11.12.151"], "data": { "10.11.12.150": [ {"metric": "data-load", "data-points": [ [4353770496.0, 4353770496.0, 4353770496.0], [6353770496.0, 6353770496.0, 6353770496.0], [6560092672.0, 6560092672.0, 6560092672.0], [6019291136.0, 6019291136.0, 6019291136.0], [6149050880.0, 6149050880.0, 6149050880.0], [6271239680.0, 6271239680.0, 6271239680.0] ] } ], "10.11.12.151": [ {"metric": "data-load", "data-points": [ [4353770496.0, 4353770496.0, 4353770496.0], [6353770496.0, 6353770496.0, 6353770496.0], [6560092672.0, 6560092672.0, 6560092672.0], [6019291136.0, 6019291136.0, 6019291136.0], [6149050880.0, 6149050880.0, 6149050880.0], [6271239680.0, 6271239680.0, 6271239680.0] ] } ] } }
Get the cluster average for data load and write ops from April 20, 2012 00:00:00 to April 2, 2012 00:00:00 GMT:
Using curl’s -G flag to build an HTTP GET request:
curl -G http://127.0.0.1:8888/Test_Cluster/new-metrics -d 'metrics=data-load,write-latency-op' -d 'node_group=*' -d 'step=86400' -d 'start=1334880000' -d 'end=1335398400' -d 'node_aggregation=1'
Manually building an HTTP GET request:
curl 'http://127.0.0.1:8888/Test_Cluster/new-metrics?metrics=data-load,write-latency-op&node_group=*&step=86400&start=1334880000&end=1335398400&node_aggregation=1'
Output:
{ "metrics": ["data-load", "write-latency-op"], "bounds": {"start": 1334880000, "end": 1335312000, "step": 86400}, "aggregation_function": { "data-load": "sum", "write-latency-op": "average" } "nodes": ["10..11.12.150", "10.11.12.151"], "data": { "aggregate": [ {"metric": "data-load", "data-points": [ [4353770496.0, 4353770496.0, 4353770496.0], [6353770496.0, 6353770496.0, 6353770496.0], [6560092672.0, 6560092672.0, 6560092672.0], [6019291136.0, 6019291136.0, 6019291136.0], [6149050880.0, 6149050880.0, 6149050880.0], [6271239680.0, 6271239680.0, 6271239680.0] ] }, {"metric": "write-latency-op", "data-points": [ [4353770496.0, 4353770496.0, 4353770496.0], [6353770496.0, 6353770496.0, 6353770496.0], [6560092672.0, 6560092672.0, 6560092672.0], [6019291136.0, 6019291136.0, 6019291136.0], [6149050880.0, 6149050880.0, 6149050880.0], [6271239680.0, 6271239680.0, 6271239680.0] ] }, ] } }
Get the write-ops for multipe cfs for all nodes from April 20, 2012 00:00:00 to April 2, 2012 00:00:00 GMT:
Using curl’s -G flag to build an HTTP GET request:
curl -G http://127.0.0.1:8888/Test_Cluster/new-metrics -d 'metrics=cf-write-ops' -d 'node_group=*' -d 'columnfamilies=OpsCenter.events,OpsCenter.settings' -d 'step=86400' -d 'start=1334880000' -d 'end=1335398400'
Manually building an HTTP GET request:
curl 'http://127.0.0.1:8888/Test_Cluster/new-metrics?metrics=cf-write-ops&node_group=*&columnfamilies=OpsCenter.events,OpsCenter.settings&step=86400&start=1334880000&end=1335398400'
Output:
{ "metrics": ['cf-write-ops'], "bounds": {"start": 1334880000, "end": 1335312000, "step": 86400}, "aggregation_function": null "nodes": ["10.11.12.150", "10.11.12.151"], "columnfamilies": ["OpsCenter.events", "OpsCenter.settings"], "data": { "10.11.12.150": [ {"metric": "cf-write-ops", "columnfamily": "OpsCenter.events", "data-points": [ [4353770496.0, 4353770496.0, 4353770496.0], [6353770496.0, 6353770496.0, 6353770496.0], [6560092672.0, 6560092672.0, 6560092672.0], [6019291136.0, 6019291136.0, 6019291136.0], [6149050880.0, 6149050880.0, 6149050880.0], [6271239680.0, 6271239680.0, 6271239680.0] ] }, ], "10.11.12.151": [ {"metric": "cf-write-ops", "columnfamily": "OpsCenter.settings", "data-points": [ [4353770496.0, 4353770496.0, 4353770496.0], [6353770496.0, 6353770496.0, 6353770496.0], [6560092672.0, 6560092672.0, 6560092672.0], [6019291136.0, 6019291136.0, 6019291136.0], [6149050880.0, 6149050880.0, 6149050880.0], [6271239680.0, 6271239680.0, 6271239680.0] ] }, ] } }
Metrics Attribute Key Lists¶
This section contains these tables of metric keys to use with resources that retrieve OpsCenter performance data:
Cluster Metrics Keys¶
This list of keys corresponds to database metrics collected by OpsCenter:
Key | Units | Description |
---|---|---|
write-ops | /sec | The number of write requests per second on the coordinator nodes, analogous to client writes. Monitoring the number of requests over a given time period reveals system write workload and usage patterns. |
write-histogram | ms/op | The min, median, max, 90th, and 99th percentiles of a client writes. The time period starts when a node receives a client write request, and ends when the node responds back to the client. Depending on consistency level and replication factor, this may include the network latency from writing to the replicas. |
write-failures | /sec | The number of write requests on the coordinator nodes that fail due to errors returned from replicas. |
write-timeouts | /sec | The number of server write timeouts per second on the coordinator nodes. |
write-unavailables | /sec | The number of write requests per second on the coordinator nodes, that fail because not enough replicas are available. |
read-ops | /sec | The number of read requests per second on the coordinator nodes, analogous to client reads. Monitoring the number of requests over a given time period reveals system read workload and usage patterns. |
read-histogram | ms/op | The min, median, max, 90th, and 99th percentiles of a client reads. The time period starts when a node receives a client read request, and ends when the node responds back to the client. Depending on consistency level and replication factor, this may include the network latency from requesting the data’s replicas. |
read-failures | /sec | The number of read requests on the coordinator nodes that fail due to errors returned from replicas. |
read-timeouts | /sec | The number of server read timeouts per second on the coordinator nodes. |
read-unavailables | /sec | The number of read requests per second on the coordinator nodes, that fail because not enough replicas are available. |
nonheap-committed | – | Allocated memory, guaranteed for Java nonheap. |
nonheap-max | – | Maximum amount that the Java nonheap can grow. |
nonheap-used | – | Average amount of Java nonheap memory used. |
heap-committed | – | Allocated memory guaranteed for the Java heap. |
heap-max | – | Maximum amount that the Java heap can grow. |
heap-used | – | Average amount of Java heap memory used. |
cms-collection-count | /sec | Number of concurrent mark sweep garbage collections performed per second. |
par-new-collection-count | /sec | Number of ParNew garbage collections performed per second. ParNew collections pause all work in the JVM but should finish quickly. |
cms-collection-time | ms/sec | Average number of milliseconds spent performing CMS garbage collections per second. |
par-new-collection-time | ms/sec | Average number of milliseconds spent performing ParNew garbage collections per second. ParNew collections pause all work in the JVM but should finish quickly. |
g1-old-collection-count | /sec | Number of G1 old generation garbage collections performed per second. |
g1-old-collection-time | ms/sec | Average number of milliseconds spent performing G1 old generation garbage collections per second. |
g1-young-collection-count | /sec | Number of G1 young generation garbage collections performed per second. |
g1-young-collection-time | ms/sec | Average number of milliseconds spent performing G1 young generation garbage collections per second. |
data-load | – | The live disk space used by all tables on a node. |
total-bytes-compacted | /sec | Number of bytes compacted per second. |
actual-total-compactions-completed | /sec | Number of compaction tasks completed per second. |
total-compactions-completed | /sec | Number of sstable scans per second that could result in a compaction. |
pending-compaction-tasks | – | Estimated number of compactions required to achieve the desired state. This includes the pending queue to the compaction executor and additional tasks that may be created from their completion. |
dropped-counter-mutations | drops/sec | Mutation was seen after the timeout (write_request_timeout_in_ms) so was thrown away. This client might have timed out before it met the required consistency level, but might have succeeded as well. Hinted handoffs and read repairs should resolve inconsistencies but a repair can ensure it. |
dropped-mutations | drops/sec | Mutation was seen after the timeout (write_request_timeout_in_ms) so was thrown away. This client might have timed out before it met the required consistency level, but might have succeeded as well. Hinted handoffs and read repairs should resolve inconsistencies but a repair can ensure it. |
dropped-reads | drops/sec | A local read request was received after the timeout (read_request_timeout_in_ms) so it was thrown away because it would have already either been completed and sent to client or sent back as a timeout error. |
dropped-ranged-slice-reads | drops/sec | A local ranged read request was received after the timeout (range_request_timeout_in_ms) so it was thrown away because it would have already either been completed and sent to client or sent back as a timeout error. |
dropped-read-repairs | drops/sec | The Mutation was seen after the timeout (write_request_timeout_in_ms) so was thrown away. With the read repair timeout, the node still exists in an inconsistent state. |
key-cache-hits | /sec | The number of key cache hits per second. This will avoid possible disk seeks when finding a partition in an SSTable. This metric only applies to SSTables created by DSE versions earlier than 6.0. |
key-cache-requests | /sec | The number of key cache requests per second. This metric only applies to SSTables created by DSE versions earlier than 6.0. |
key-cache-hit-rate | – | The percentage of key cache lookups that resulted in a hit. This metric only applies to SSTables created by DSE versions earlier than 6.0. |
row-cache-hits | /sec | The number of row cache hits per second. |
row-cache-requests | /sec | The number of row cache requests per second. |
row-cache-hit-rate | – | The percentage of row cache lookups that resulted in a hit. |
native-connections | – | The number of clients connected using the native protocol. |
read-repair-attempted | /sec | Number of read requests where the number of nodes queried possibly exceeds the consistency level requested in order to check for a possible digest mismatch. |
read-repaired-background | /sec | Corresponds to a digest mismatch that occurred after a completed read, outside of the client read loop. |
read-repaired-blocking | /sec | Corresponds to the number of times there was a digest mismatch within the requested consistency level and a full data read was started. |
speculative-retries | retries | Number of speculative retries for all column families. |
stream-out-total | /sec | Data streamed out from this node to all other nodes, for all tables. |
stream-in-total | /sec | Data streams in to this node from all other nodes, for all tables. |
hint-creation-rate | /sec | Rate at which new individual hints are stored on this node, to be replayed to peers. |
in-memory-percent-used | – | The percentage of memory allocated for in-memory tables currently in use. |
view-write-histogram | ms/op | The min, median, max, 90th, and 99th percentiles of the time from when base mutation is applied to memtable until CL.ONE is achieved on the async write to the tables materialized views. An estimate to determine the lag between base table mutations and the views consistency. |
view-replicas-success | mutations | Number of view mutations sent to replicas that have been acknowledged. |
view-replicas-pending | mutations | Number of view mutations sent to replicas where the replicas acknowledgement hasn’t been received. |
cells-scanned-during-read | cells | The min, median, max, 90th, and 99th percentile of how many cells were scanned during a read. |
pending-graph-query-threads | – | Number of pending tasks in the GraphQueryThreads thread pool. |
active-graph-query-threads | – | Number of active tasks in the GraphQueryThreads thread pool. |
completed-graph-query-threads | – | Number of tasks completed by the GraphQueryThreads thread pool. |
pending-graph-scheduled-threads | – | Number of pending tasks in the GraphScheduledThreads thread pool. |
active-graph-scheduled-threads | – | Number of active tasks in the GraphScheduledThreads thread pool. |
completed-graph-scheduled-threads | – | Number of tasks completed by the GraphScheduledThreads thread pool. |
pending-graph-system-threads | – | Number of pending tasks in the GraphSystemThreads thread pool. |
active-graph-system-threads | – | Number of active tasks in the GraphSystemThreads thread pool. |
completed-graph-system-threads | – | Number of tasks completed by the GraphSystemThreads thread pool. |
pending-gremlin-worker-threads | – | Number of pending tasks in the GremlinWorkerThreads thread pool. |
active-gremlin-worker-threads | – | Number of active tasks in the GremlinWorkerThreads thread pool. |
completed-gremlin-worker-threads | – | Number of tasks completed by the GremlinWorkerThreads thread pool. |
percentage-repaired | % | Percentage of data (uncompressed) marked as repaired across all non-system tables on a node. Tables with a replication factor of 1 are excluded. |
read-coordinator-nonreplica | /sec | Rate of coordinated reads to a node where that node is not a replica for that partition. |
read-coordinator-preferother | /sec | Rate of coordinated reads to a node where that node did not choose itself as a replica for the read request. |
hints-on-disk | – | The number of hints currently stored on disk, to be replayed to peers. |
hint-replay-success-rate | /sec | Rate of successful individual hint replays to peers. If one or more individual hints fail to replay in a batch, the successful hints in that batch will be replayed again and double counted in this metric. |
hint-replay-error-rate | /sec | Rate of failed individual hint replays. Replay of a single hint can fail more than once if retried. |
hint-replay-timeout-rate | /sec | Rate of timed out individual hint replays. Replay of a single hint can timeout more than once if retried. |
hint-replay-received-rate | /sec | Rate of successful individual hints replayed to this node, from other peers. |
cross-node-latency | ms/op | The min, median, max, 90th, and 99th percentiles of the latency of messages between nodes. The time period starts when a node sends a message and ends when the current node receives it. |
nodesync-data-repaired | bytes | Bytes of data that were inconsistent and needed synchronization. |
nodesync-data-validated | bytes | Bytes of data checked for consistency. |
nodesync-repair-data-sent | bytes | Total bytes of data transferred between all nodes during synchronization. |
nodesync-objects-repaired | objects | Number of rows and range tombstones that were inconsistent and needed synchronization. |
nodesync-objects-validated | objects | Number of rows and range tombstones checked for consistency. |
nodesync-repair-objects-sent | objects | Total number of rows and range tombstones transferred between all nodes during synchronization. |
nodesync-processed-pages | pages | Number of pages (internal groupings of data) processed. |
nodesync-full-in-sync-pages | pages | Number of processed pages that were not in need of synchronization. |
nodesync-full-repaired-pages | pages | Number of processed pages that were in need of synchronization. |
nodesync-partial-in-sync-pages | pages | Number of in sync pages for which a response was gotten from only a partial number of replicas. |
nodesync-partial-repaired-pages | pages | Number of repaired pages for which a response was gotten from only a partial number of replicas. |
nodesync-uncompleted-pages | pages | Number of processed pages not having enough responses to perform synchronization. |
nodesync-failed-pages | pages | Number of processed pages for which an unknown error prevented proper synchronization completion. |
dropped-view-mutations | drops/sec | Mutation of Materialized View was seen after the timeout (write_request_timeout_in_ms) so was thrown away. This client might have timed out before it met the required consistency level, but might have succeeded as well. Hinted handoffs and read repairs should resolve inconsistencies but a repair can ensure it. |
dropped-lwt | drops/sec | Lightweight Transaction was seen after the timeout (write_request_timeout_in_ms) so was thrown away. This client might have timed out before it met the required consistency level, but might have succeeded as well. Hinted handoffs and read repairs should resolve inconsistencies but a repair can ensure it. |
dropped-hints | drops/sec | Hinted Handoff was seen after the timeout (write_request_timeout_in_ms) so was thrown away. Repairing the data or using NodeSync, should resolve data inconsistencies. |
dropped-truncates | drops/sec | Truncate operation was seen after the timeout (truncate_request_timeout_in_ms) so was thrown away. |
dropped-snapshots | drops/sec | Snapshot Request was seen after the timeout (request_timeout_in_ms) so was thrown away. Snapshot should be retried. |
dropped-schemas | drops/sec | Schema change was seen after the timeout (request_timeout_in_ms) so was thrown away. Schema agreement may not have been reached immediately, but this will eventually resolve itself. |
dropped-repairs | drops/sec | Repair message was seen after the timeout so was thrown away. |
dropped-other | drops/sec | Miscellaneous message was seen after the timeout so was thrown away. |
dropped-node-sync | drops/sec | Node-sync message was seen after the timeout so was thrown away. |
dropped-batch-store | drops/sec | Batch store message was seen after the timeout so was thrown away. |
Thread Pool Metrics Keys¶
This list of keys corresponds to thread pool metrics collected by OpsCenter:
Key | Description |
---|---|
pending-flushes | Number of memtables queued for the flush process. A flush sorts and writes the memtables to disk. |
pending-gossip-stage | Number of gossip messages and acknowledgments queued and waiting to be sent or received. |
pending-internal-response-stage | Number of pending tasks from internal tasks, such as nodes joining and leaving the cluster. |
pending-anti-entropy-stage | Repair tasks pending, such as handling the merkle tree transfer after the validation compaction. |
pending-cache-cleanup-stage | Tasks pending to clean row caches during a cleanup compaction. |
pending-memtable-post-flush | Tasks related to the last step in flushing memtables to disk as SSTables. Includes removing unnecessary commitlog files and committing Solr-based secondary indexes. |
pending-migration-stage | Number of pending tasks from system methods that modified the schema. |
pending-misc-stage | Number of pending tasks from infrequently run operations, such as taking a snapshot or processing the notification of a completed replication. |
pending-read-stage | Number of pending read requests. Read requests read data off of disk and deserialize cached data. |
pending-read-repair-stage | Number of read repair operations in the queue waiting to run. |
pending-request-response-stage | Number of pending callbacks to execute after a task on a remote node completes. |
pending-mutation-stage | Number of write requests received by the cluster and waiting to be handled. |
pending-validation-executor | Pending task to read data from sstables and generate a merkle tree for a repair. |
pending-compaction-executor | Pending compactions that are known. This may deviate from “pending compactions” which includes an estimate of tasks that these pending tasks may create after completion. |
pending-pending-range-calculator | Pending tasks to calculate the ranges according to bootsrapping and leaving nodes. |
active-flushes | Up to memtable_flush_writers concurrent tasks to flush and write the memtables to disk. |
active-gossip-stage | Number of gossip messages and acknowledgments actively being sent or received. |
active-internal-response-stage | Number of active tasks from internal tasks, such as nodes joining and leaving the cluster. |
active-anti-entropy-stage | Repair tasks active, such as handling the merkle tree transfer after the validation compaction. |
active-cache-cleanup-stage | Tasks to clean row caches during a cleanup compaction. |
active-memtable-post-flush | Tasks related to the last step in flushing memtables to disk as SSTables. Includes removing unnecessary commitlog files and committing Solr-based secondary indexes. |
active-migration-stage | Number of active tasks from system methods that modified the schema. |
active-misc-stage | Number of active tasks from infrequently run operations, such as taking a snapshot or processing the notification of a completed replication. |
active-read-stage | Number of active read requests. Read requests read data off of disk and deserialize cached data. |
active-read-repair-stage | Number of read repair operations actively being run. |
active-request-response-stage | Number of callbacks to being executed after a task on a remote node is completed. |
active-mutation-stage | Number of write requests being handled. |
active-validation-executor | Active task to read data from sstables and generate a merkle tree for a repair. |
active-compaction-executor | Active compactions that are known. |
active-pending-range-calculator | Active tasks to calculate the ranges according to bootsrapping and leaving nodes. |
completed-flushes | Number of memtables flushed to disk since the nodes start. |
completed-gossip-stage | Number of gossip messages and acknowledgments recently sent or received. |
completed-internal-response-stage | Number of recently completed tasks from internal tasks, such as nodes joining and leaving the cluster. |
completed-anti-entropy-stage | Repair tasks recently completed, such as handling the merkle tree transfer after the validation compaction. |
completed-cache-cleanup-stage | Tasks to clean row caches during a cleanup compaction. |
completed-memtable-post-flush | Tasks related to the last step in flushing memtables to disk as SSTables. Includes removing unnecessary commitlog files and committing Solr-based secondary indexes. |
completed-migration-stage | Number of completed tasks from system methods that modified the schema. |
completed-misc-stage | Number of completed tasks from infrequently run operations, such as taking a snapshot or processing the notification of a completed replication. |
completed-read-stage | Number of completed read requests. Read requests read data off of disk and deserialize cached data. |
completed-read-repair-stage | Number of read repair operations recently completed. |
completed-request-response-stage | Number of completed callbacks executed after a task on a remote node is completed. |
completed-mutation-stage | Number of write requests received by the cluster that have been handled. |
completed-validation-executor | Completed tasks to read data from sstables and generate a merkle tree for a repair. |
completed-compaction-executor | Completed compactions. |
completed-pending-range-calculator | Completed tasks to calculate the ranges according to bootsrapping and leaving nodes. |
pending-counter-mutations | Pending tasks to execute local counter mutations. |
active-counter-mutations | Up to concurrent_counter_writes running tasks that execute local counter mutations. |
completed-counter-mutations | Number of local counter mutations that have been executed. |
memtable-reclaim-pending | Waits for current reads to complete and then frees the memory formerly used by the obsoleted memtables. |
memtable-reclaim-active | Waits for current reads to complete and then frees the memory formerly used by the obsoleted memtables. |
completed-memtable-reclaim | Waits for current reads to complete and then frees the memory formerly used by the obsoleted memtables. |
pending-view-mutation-stage | Number of mutations to apply locally after modifications to a base table. |
active-view-mutation-stage | Number of mutations to being applied locally after modifications to a base table. |
completed-view-mutation-stage | Number of mutations applied locally after modifications to a base table. |
pending-hint-dispatcher | Pending tasks to send the stored hinted handoffs to a host. |
active-hint-dispatcher | Up to max_hints_delivery_threads tasks, each dispatching all hinted handoffs to a host. |
completed-hint-dispatcher | Number of tasks to transfer hints to a host that have completed. |
pending-secondary-index-management | Any initialization work when a new index instance is created. This may involve costly operations such as (re)building the index. |
active-secondary-index-management | Any initialization work when a new index instance is created. This may involve costly operations such as (re)building the index. |
completed-secondary-index-management | Any initialization work when a new index instance is created. This may involve costly operations such as (re)building the index. |
active-authentication | Authentication Active |
completed-authentication | Authentication Completed |
active-read-range | Read Range Active |
completed-read-range | Read Range Completed |
active-execute-statement | Execute Statement Active |
completed-execute-statement | Execute Statement Completed |
active-timed-speculate | Timed Speculate Active |
completed-timed-speculate | Timed Speculate Completed |
active-unknown | Unknown Active |
completed-unknown | Unknown Completed |
active-truncate | Truncate Active |
completed-truncate | Truncate Completed |
active-timed-histogram-aggregate | Timed Histogram Aggregate Active |
completed-timed-histogram-aggregate | Timed Histogram Aggregate Completed |
active-counter-acquire-lock | Counter Acquire Lock Active |
completed-counter-acquire-lock | Counter Acquire Lock Completed |
active-read | Read Active |
completed-read | Read Completed |
active-cas | CAS Active |
completed-cas | CAS Completed |
active-write-switch-for-memtable | Write Switch For Memtable Active |
completed-write-switch-for-memtable | Write Switch For Memtable Completed |
active-read-disk-async | Read Disk Async Active |
completed-read-disk-async | Read Disk Async Completed |
active-timed-unknown | Timed Unknown Active |
completed-timed-unknown | Timed Unknown Completed |
active-timed-meter-tick | Timed Meter Tick Active |
completed-timed-meter-tick | Timed Meter Tick Completed |
active-timed-timeout | Timed Timeout Active |
completed-timed-timeout | Timed Timeout Completed |
active-write | Write Active |
completed-write | Write Completed |
active-write-defragment | Write Defragment Active |
completed-write-defragment | Write Defragment Completed |
active-read-secondary-index | Read Secondary Index Active |
completed-read-secondary-index | Read Secondary Index Completed |
pending-read-range | Read Range Pending |
total-blocked-read | Total Read Blocked |
total-blocked-read-range | Total Read Range Blocked |
total-blocked-write-defragment | Total Write Defragment Blocked |
total-blocked-write | Total Write Blocked |
pending-write-defragment | Write Defragment Pending |
pending-write | Write Pending |
pending-read | Read Pending |
active-eventloop-spin | Eventloop Spin Active |
completed-read-deferred | Read Deferred Completed |
completed-authorization | Authorization Completed |
completed-batch-replay | Batch Replay Completed |
active-write-await-commitlog-segment | Write Await Commitlog Segment Active |
active-eventloop-park | Eventloop Park Active |
active-read-switch-for-response | Read Switch For Response Active |
active-nodesync-validation | Nodesync Validation Active |
active-read-switch-for-iterator | Read Switch For Iterator Active |
active-batch-remove | Batch Remove Active |
active-batch-replay | Batch Replay Active |
active-read-range-switch-for-response | Read Range Switch For Response Active |
active-write-switch-for-response | Write Switch For Response Active |
completed-batch-remove | Batch Remove Completed |
completed-batch-store-response | Batch Store Response Completed |
active-write-memtable-full | Write Memtable Full Active |
pending-lwt-propose | Lwt Propose Pending |
active-write-await-commitlog-sync | Write Await Commitlog Sync Active |
completed-nodesync-validation | Nodesync Validation Completed |
completed-lwt-commit | Lwt Commit Completed |
completed-read-switch-for-response | Read Switch For Response Completed |
active-eventloop-yield | Eventloop Yield Active |
active-lwt-prepare | Lwt Prepare Active |
completed-lwt-propose | Lwt Propose Completed |
pending-batch-store | Batch Store Pending |
completed-read-switch-for-iterator | Read Switch For Iterator Completed |
pending-lwt-prepare | Lwt Prepare Pending |
completed-write-memtable-full | Write Memtable Full Completed |
pending-truncate | Truncate Pending |
pending-read-deferred | Read Deferred Pending |
completed-eventloop-spin | Eventloop Spin Completed |
completed-write-switch-for-response | Write Switch For Response Completed |
completed-eventloop-park | Eventloop Park Completed |
active-lwt-propose | Lwt Propose Active |
completed-lwt-prepare | Lwt Prepare Completed |
active-authorization | Authorization Active |
completed-eventloop-yield | Eventloop Yield Completed |
completed-batch-store | Batch Store Completed |
active-batch-store | Batch Store Active |
pending-batch-remove | Batch Remove Pending |
active-lwt-commit | Lwt Commit Active |
pending-lwt-commit | Lwt Commit Pending |
completed-write-await-commitlog-segment | Write Await Commitlog Segment Completed |
completed-read-range-switch-for-response | Read Range Switch For Response Completed |
active-batch-store-response | Batch Store Response Active |
completed-write-await-commitlog-sync | Write Await Commitlog Sync Completed |
active-read-deferred | Read Deferred Active |
total-blocked-batch-remove | Total Batch Remove Blocked |
total-blocked-read-deferred | Total Read Deferred Blocked |
total-blocked-lwt-commit | Total Lwt Commit Blocked |
total-blocked-lwt-propose | Total Lwt Propose Blocked |
total-blocked-truncate | Total Truncate Blocked |
total-blocked-lwt-prepare | Total Lwt Prepare Blocked |
total-blocked-batch-store | Total Batch Store Blocked |
Table Metrics Keys¶
This list of keys corresponds to table-specific metrics collected by OpsCenter:
Key | Units | Description |
---|---|---|
cf-write-ops | /sec | Local write requests per second. Local writes update the table’s memtable and appends to a commitlog. |
cf-local-write-latency | ms/op | The min, median, max, 90th, and 99th percentile of the response times to write data to a table’s memtable. The elapsed time from when the replica receives the request from a coordinator and returns a response. |
cf-read-ops | /sec | Local read requests per second. Local reads retrieve data from a table’s memtable and any necessary SSTables on disk. |
cf-local-read-latency | ms/op | The min, median, max, 90th, and 99th percentile of the response time to read data from the memtable and sstables for a specific table. The elapsed time from when the replica receives the request from a coordinator and returns a response. |
cf-live-disk-used | – | Disk space used by live SSTables. There might be obsolete SSTables not included. |
cf-total-disk-used | – | Disk space used by a table by SSTables, including obsolete ones waiting to be garbage collected. |
cf-live-sstables | – | Total number of SSTables for a table. |
cf-sstables-per-read | sstables | The min, median, max, 90th, and 99th percentile of how many SSTables are accessed during a read. Includes sstables that undergo bloom-filter checks, even if no data is read from the sstable. |
cf-partition-size | The min, median, max, 90th, and 99th percentile of the size (in bytes) of partitions of this table. | |
cf-column-count | cells | The min, median, max, 90th, and 99th percentile of how many cells exist in partitions for this table. |
cf-bf-space-used | – | The total size of all the SSTables’ bloom filters for this table. |
cf-bf-false-positives | /sec | Number of bloom filter false positives per second. |
cf-bf-false-ratio | – | Percentage of bloom filter lookups that resulted in a false positive. |
solr-requests | /sec | Requests per second made to a specific Solr core/index. |
solr-avg-time-per-req | ms/request | Average time a search query takes in a DSE cluster using DSE Search. |
solr-errors | /sec | Errors per second that occur for a specific Solr core/index. |
solr-timeouts | /sec | Timeouts per second on a specific Solr core/index. |
solr-index-size | KB | Size of the Solr core on disk. |
cf-sstable-size | – | – |
cf-speculative-retries | retries | Number of speculative retries for this table. |
cf-bf-offheap | – | Total off heap memory used by bloom filters from all live SSTables in a table. |
cf-index-summary-offheap | – | Total off heap memory used by the index summary of all live SSTables in a table. |
cf-compression-data-offheap | – | Total off heap memory used by the compression metadata of all live SSTables in a table. |
cf-memtable-offheap | – | Off heap memory used by a table’s current memtable. |
cf-all-memtables-heapsize | – | An estimate of the space used in JVM heap memory for all memtables. This includes ones that are currently being flushed and related secondary indexes. |
cf-all-memtables-livedatasize | – | An estimate of the space used for ‘live data’ (off-heap, excluding overhead) for all memtables. This includes ones that are currently being flushed and related secondary indexes. |
cf-all-memtables-offheapsize | – | An estimate of the space used in off-heap memory for all memtables. This includes ones that are currently being flushed and related secondary indexes. |
cf-row-size | – | Approximate number of partitions. This may be off given duplicates in memtables and sstables are both counted and there is a very small error percentage inherited from the HyperLogLog data structure. |
cf-tombstones-per-read | tombstones | The min, median, max, 90th, and 99th percentile of how many tombstones are read during a read. |
cf-write-latency-legacy | ms/op | <b>Deprecated</b>. Median response time to write data to a table’s memtable. The elapsed time from when the replica receives the request from a coordinator and returns a response. |
cf-read-latency-legacy | ms/op | <b>Deprecated</b>. Median response time to read data from the memtable and SSTables for a specific table. The elapsed time from when the replica receives the request from a coordinator and returns a response. |
cf-coordinator-read-latency | ms/op | The min, median, max, 90th, and 99th percentiles of client reads on this table. The time period starts when a node receives a client read request, and ends when the node responds back to the client. Depending on consistency level and replication factor, this may include the network latency from requesting the data’s replicas. |
cf-coordinator-read-ops | /sec | The number of read requests per second for a particular table on the coordinator nodes. Monitoring the number of requests over a given time period reveals table read workload and usage patterns. |
cf-cells-scanned-during-read | cells | The min, median, max, 90th, and 99th percentile of how many cells were scanned during a read. |
cf-tier-size | – | Disk space used by a table by SSTables for the tier. |
cf-tier-sstables | sstables | Number of SSTables in a tier for a table. |
cf-tier-max-data-age | – | Timestamp in local server time that represents an upper bound to the newest piece of data stored in the SSTable. When a new SSTable is flushed, it is set to the time of creation. When an SSTable is created from compaction, it is set to the max of all merged SSTables. |
cf-percentage-repaired | % | Percentage of data (uncompressed) marked as repired for a given table on a node. This metric is only meaningful for replication factor > 1. |
nodesync-tbl-data-repaired | bytes | Bytes of data that were inconsistent and needed synchronization. |
nodesync-tbl-data-validated | bytes | Bytes of data checked for consistency. |
nodesync-tbl-repair-data-sent | bytes | Total bytes of data transferred between all nodes during synchronization. |
nodesync-tbl-objects-repaired | objects | Number of rows and range tombstones that were inconsistent and needed synchronization. |
nodesync-tbl-objects-validated | objects | Number of rows and range tombstones checked for consistency. |
nodesync-tbl-repair-objects-sent | objects | Total number of rows and range tombstones transferred between all nodes during synchronization. |
nodesync-tbl-processed-pages | pages | Number of pages (internal groupings of data) processed. |
nodesync-tbl-full-in-sync-pages | pages | Number of processed pages that were not in need of synchronization. |
nodesync-tbl-full-repaired-pages | pages | Number of processed pages that were in need of synchronization. |
nodesync-tbl-partial-in-sync-pages | pages | Number of in sync pages for which a response was gotten from only a partial number of replicas. |
nodesync-tbl-partial-repaired-pages | pages | Number of repaired pages for which a response was gotten from only a partial number of replicas. |
nodesync-tbl-uncompleted-pages | pages | Number of processed pages not having enough responses to perform synchronization. |
nodesync-tbl-failed-pages | pages | Number of processed pages for which an unknown error prevented proper synchronization completion. |
Storage Tier Metrics Keys¶
This list of keys corresponds storage tier-specific metrics collected by OpsCenter:
Key | Units | Description |
---|---|---|
cf-tier-size | – | Disk space used by a table by SSTables for the tier. |
cf-tier-sstables | sstables | Number of SSTables in a tier for a table. |
cf-tier-max-data-age | – | Timestamp in local server time that represents an upper bound to the newest piece of data stored in the SSTable. When a new SSTable is flushed, it is set to the time of creation. When an SSTable is created from compaction, it is set to the max of all merged SSTables. |
Operating System Metrics Keys¶
This list of keys corresponds to operating system (OS) metrics collected by OpsCenter:
Key | OS | Units | Description |
---|---|---|---|
os-memory | linux | MB | Stacked graph of used, cached, and free memory. |
os-memory-osx | osx | MB | Stacked graph of used and free memory. |
os-memory-free | linux, osx | MB | Total system memory currently free. |
os-memory-used | linux, osx | MB | Total system memory currently used. |
os-memory-shared | linux | MB | Total amount of memory in shared memory space. |
os-memory-buffers | linux | MB | Total system memory currently buffered. |
os-memory-cached | linux | MB | Total system memory currently cached. |
os-memory-win | windows | MB | Stacked graph of committed, cached, paged, non-paged, and free memory. |
os-memory-avail | windows | MB | Available physical memory. |
os-memory-committed | windows | MB | Memory in use by the operating system. |
os-memory-pool-paged | windows | MB | Allocated pool-paged-resident memory. |
os-memory-pool-nonpaged | windows | MB | Allocated pool-nonpaged memory. |
os-memory-sys-cache-resident | windows | MB | Memory used by the file cache. |
cpu | linux | – | Stacked graph of iowait, steal, nice, system, user, and idle CPU usage. |
cpu-osx | osx | – | Stacked graph of idle, user, and system CPU usage. |
cpu-win | windows | – | Stacked graph of user, privileged, and idle CPU usage. |
os-cpu-user | – | – | Time the CPU devotes to user processes. |
os-cpu-system | linux, osx | – | Time the CPU devotes to system processes. |
os-cpu-idle | – | – | Time the CPU is idle. |
os-cpu-iowait | linux | – | Time the CPU devotes to waiting for I/O to complete. |
os-cpu-steal | linux | – | Time the CPU devotes to tasks stolen by virtual operating systems. |
os-cpu-nice | linux | – | Time the CPU devotes to processing nice tasks. |
os-cpu-privileged | windows | – | Time the CPU devotes to processing privileged instructions. |
os-load | – | – | Operating system load average. One minute value parsed from /proc/loadavg on Linux systems. |
os-disk-usage | – | – | Disk space used by Cassandra at a given time. |
os-disk-free | – | GB | Free space on a specific disk partition. |
os-disk-used | – | GB | Disk space used by Cassandra at a given time. |
os-disk-read-throughput | linux, windows | MB/sec | Average disk throughput for read operations. |
os-disk-write-throughput | linux, windows | MB/sec | Average disk throughput for write operations. |
os-disk-throughput | osx | MB/sec | Average disk throughput for read and write operations. |
os-disk-read-rate | linux, windows | /sec | Rate of reads per second to the disk. |
os-disk-write-rate | linux, windows | /sec | Rate of writes per second to the disk. |
os-disk-await | linux, windows | ms | Average completion time of each request to the disk. |
os-disk-request-size | linux, osx | sectors | Average size of read requests issued to the disk. |
os-disk-request-size-kb | windows | KB | Average size of read requests issued to the disk. |
os-disk-queue-size | linux, windows | requests | Average number of requests queued due to disk latency issues. |
os-disk-utilization | linux, windows | – | CPU time consumed by disk I/O. |
os-net-received | – | KB/sec | Speed of data received from the network. |
os-net-sent | – | KB/sec | Speed of data sent across the network. |
os-disk-space | – | GB | – |
os-disk-throughput-grouped | linux, windows | MB/sec | – |
os-disk-rate | linux, windows | /sec | – |
os-net-traffic | – | KB/sec | – |
os-net-sent-win | – | – | Speed of data sent across the network. |
os-net-received-win | – | – | Speed of data received from the network. |