Retrieving Metric Data¶

Using the metric retrieval methods you can retrieve performance metrics at the cluster, node, and column family levels.

Additionally, you have the ability to use existing metric data to create a forecast of future data points for a specific metric. More information on forecasting is available here

Metric Retrieval Methods	URL
Retrieve cluster-wide metrics.	`GET /{cluster_id}/cluster-metrics/{dc}/{metric}`
Retrieve cluster-wide metrics about a device.	`GET /{cluster_id}/cluster-metrics/{dc}/{metric}/{device}`
Retrieve cluster-wide metrics about a column family.	`GET /{cluster_id}/cluster-metrics/{dc}/{ks_name}/{cf_name}/{metric}`
Retrieve metrics about a node.	`GET /{cluster_id}/metrics/{node_ip}/{metric}`
Retrieve node-specific metrics about a device.	`GET /{cluster_id}/metrics/{node_ip}/{metric}/{device}`
Retrieve node-specific metrics about a column family.	`GET /{cluster_id}/metrics/{node_ip}/{ks_name}/{cf_name}/{metric}`
Retrieve a forecast for a cluster-wide metric.	`GET /{cluster_id}/cluster-metrics/{dc}/{metric}`
New way to retrieve all types of metrics.	`GET /{cluster_id}/new-metrics`

You can choose from a large number of metric keys to pass with these methods, making retrieval of a wide spectrum of performance information possible.

Controlling the Metric Data Output¶

You can also use the following query parameters with these methods to control the output:

Query Parameter	Description
start	(optional) A timestamp in seconds indicating the beginning of the time range to fetch. When omitted, this defaults to one day before the `end` parameter.
end	(optional) A timestamp in seconds indicating the end of the time range to fetch. When omitted, this defaults to the current time.
step	(optional) The resolution of the data points for the metric. Valid input options are: 1, 5, 120, or 1440 minutes; corresponding output intervals are 60, 300, 7200, or 86400 seconds. The default is a 1 minute step.
step	(optional; new-metrics API) The new metrics api requires that the step argument be specified in seconds rather than minutes. This is to stay consistent with the return format of the new api. Valid inputs in this case are: 60, 300, 7200, and 86400.
function	(optional) The type of aggregation to perform on the metric: min, max, or average. By default, results are returned for all three types of aggregation.
forecast	(optional) A boolean flag indicating that we would like to generate a forecast for the time range and step specified. This will use past data to calculate projected data points in the time range specified.

Results of calls to retrieve metrics are returned in the following format:

{
  [<node_ip>: | <device>: | <keyspace.columnfamily>:]
    {
      <function>:
        [
          [<timestamp> <value>],
          ...
        ]
    }
}

By default, the output is metric data points at 60-second intervals over a 24-hour period. Data points are listed in chronological order, starting with the oldest data point first.

GET /{cluster_id}/cluster-metrics/{dc}/{metric}¶

Aggregate a metric across multiple nodes in the cluster rather than retrieving data about a single node.

Path arguments:	cluster_id – A Cluster Config ID. dc – The name of the data center for the nodes. Use the name `all` to aggregate a metric across all data centers. metric – One of the Cluster Metrics Keys.
Query params:	parameters – The parameters listed in Controlling the Metric Data Output.

Returns metric data across multiple nodes in a cluster.

Example

Get the average write requests per second over to the cluster over all data centers on May 1, 2012 from 8 AM to 5 PM GMT. Show data points at 2-hour (120-minute) intervals.

curl -G
  http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/write-ops
    -d 'step=120'
    -d 'start=1335859200'
    -d 'end=1335891600'
    -d 'function=average'

Output:

Data points at 2-hour (7200 seconds) intervals show the number of write requests per second during business hours on May 1.

{
  "Total": {
    "AVERAGE": [
      [
        1335859200,
        null
      ],
      [
        1335866400,
        13.376885890960693
      ],
      [
        1335873600,
        13.372154712677002
      ],
      [
        1335880800,
        13.365732669830322
      ],
      [
        1335888000,
        13.392115592956543
      ]
    ]
  }
}

GET /{cluster_id}/cluster-metrics/{dc}/{metric}/{device}¶

Aggregate a disk or network metric, which pertains to a specific device, across multiple nodes in the cluster rather than retrieving data about a single node.

Path arguments:

cluster_id – A Cluster Config ID.
dc – The name of the data center for the nodes. Use the name all to aggregate a metric across all data centers.
metric – One of the Cluster Metrics Keys or Operating System Metrics Keys.
device – The device to be measured, which the Node object lists. Use the name all to measure all devices, For example, when requesting a disk metric, all will aggregate metrics from all disk devices.

Query params:

parameters – The parameters listed in Controlling the Metric Data Output.

Examples of Device Arguments

To determine the set of network interfaces that metrics are available for, you can run a query similar to the following:

curl http://localhost:8888/Test_Cluster/nodes/192.168.1.1/network_interfaces

["lo0", "eth0", "eth1"]

In this case, lo0, eth0, and eth1 can all be used.

Disk devices can be discovered in a similar way.

curl http://localhost:8888/Test_Cluster/nodes/192.168.1.1/devices

{
  "commitlog": "sdb",
  "data": ["sda"],
  "saved_caches": "sda",
  "other": ["sdc"]
}

In this case, any of sda, sdb, or sdc may be used.

Finally, metrics are also captured for disk partitions and filesystems:

curl http://localhost:8888/Test_Cluster/nodes/192.168.1.1/partitions

{
  "commitlog": "/dev/sdb1",
  "data": ["/dev/sda1"],
  "saved_caches": "/dev/sda1",
  "other": ["/dev/sdc1"]
}

Here, the available partitions are /dev/sda1, /dev/sdb1, and /dev/sdc1. Keep in mind that you will need to URL-encode the items, so /dev/sda1 will become %2Fdev%2Fsda1.

Using a partition, network interface, or other device name for the device argument returns disk or network metric data about a specific device across multiple nodes. Using all for the device name returns a dictionary of keys (device names) and the values (results for that device).

Example

Get the average GB of space on all disks in all data centers used each day by the cluster from April 11, 2012 00:00:00 to April 26, 2012 00:00:00 GMT.

curl -G
  http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/os-disk-used/all
    -d 'step=1440'
    -d 'start=1334102400'
    -d 'end=1335398400'
    -d 'function=average'

Output:

{
  "Total": {
    "AVERAGE": [
      [
        1334102400,
        null
      ],
      [
        1334188800,
        21.000694274902344
      ],
      [
        1334275200,
        8.736943244934082
      ],
      [
        1334361600,
        9.0
      ],
      [
        1334448000,
        19.0
      ],
      [
        1334534400,
        19.0
      ],
      [
        1334620800,
        19.0
      ],
      [
        1334707200,
        19.0
      ],
      [
        1334793600,
        18.629029273986816
      ],
      [
        1334880000,
        19.923184394836426
      ],
      [
        1334966400,
        25.0
      ],
      [
        1335052800,
        25.0
      ],
      [
        1335139200,
        25.923053741455078
      ],
      [
        1335225600,
        26.0
      ],
      [
        1335312000,
        26.549484252929688
      ]
    ]
  }
}

GET /{cluster_id}/cluster-metrics/{dc}/{ks_name}/{cf_name}/{metric}¶

Aggregate a column family metric across multiple nodes in the cluster rather than retrieving data about a single node.

Path arguments:	cluster_id – A Cluster Config ID. dc – The name of the data center for the nodes. Use the name `all` to aggregate a metric across all data centers. ks_name – The keyspace that contains the column family to be measured. cf_name – The column family to be measured. metric – One of the Column Family Metrics Keys.
Query params:	parameters – The parameters listed in Controlling the Metric Data Output.

Returns metric data for multiple nodes.

Example

Get the maximum bytes of disk space used for live data by the Users column family in the Keyspace1 keyspace of the cluster over all data centers from May 1, 2012 00:00:00 to May 5, 2012 00:00:00 GMT:

curl -G
  http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/Keyspace1/Users/cf-live-disk-used
  -d 'function=max'
  -d 'start=1335830400'
  -d 'end=1336176000'
  -d 'step=1440'

Output:

Data points at 24-hour intervals show the metrics for the period.

{

  "Total": {
    "MAX": [
      [
        1335830400,
        9740462592.0
      ],
      [
        1335916800,
        9932527616.0
      ],
      [
        1336003200,
        null
      ],
      [
        1336089600,
        10644448512.0
      ]
    ]
  }
}

GET /{cluster_id}/metrics/{node_ip}/{metric}¶

Retrieve metric data for a single node.

Path arguments:	cluster_id – A Cluster Config ID. node_ip – IP address of the target Node. metric – One of the Cluster Metrics Keys.
Query params:	parameters – The parameters listed in Controlling the Metric Data Output.

Returns metric data for a single node.

Example

Get the daily average data load on cluster node 10.11.12.150 from April 20, 2012 00:00:00 to April 26, 2012 00:00:00 GMT:

curl -G
http://127.0.0.1:8888/Test_Cluster/metrics/10.11.12.150/data-load
  -d 'step=1440'
  -d 'start=1334880000'
  -d 'end=1335398400'
  -d 'function=average'

Output:

{
  "10.11.12.150": {
    "AVERAGE": [
      [
        1334880000,
        null
      ],
      [
        1334966400,
        6353770496.0
      ],
      [
        1335052800,
        6560092672.0
      ],
      [
        1335139200,
        6019291136.0
      ],
      [
        1335225600,
        6149050880.0
      ],
      [
        1335312000,
        6271239680.0
      ]
    ]
  }
}

GET /{cluster_id}/metrics/{node_ip}/{metric}/{device}¶

Aggregate a disk or network metric for a single node.

Path arguments:	cluster_id – A Cluster Config ID. node_ip – IP address of the target Node. metric – One of the Cluster Metrics Keys or Operating System Metrics Keys. device – The device to be measured. Use the name `all` to measure all devices associated with a disk metric. See `GET /{cluster_id}/cluster-metrics/{dc}/{metric}/{device}` for examples of devices.
Query params:	parameters – The parameters listed in Controlling the Metric Data Output.

Returns disk or network metrics data for a single node.

Example

Get the maximum GB of disk space for all disks used by cluster node 10.11.12.150 from April 30, 2012 at 22:05 to May 1, 2012 8:00:00 GMT:

curl -G
http://127.0.0.1:8888/Test_Cluster/metrics/10.11.12.150/os-disk-used/all
  -d 'start=1335823500'
  -d 'end=1335859200'
  -d 'step=120'
  -d 'function=max'

Output:

Data points at 2-minute intervals show the disk space used by device /dev/sda1.

{
  "/dev/sda1": {
    "MAX": [
      [
        1335823200,
        null
      ],
      [
        1335830400,
        17.0
      ],
      [
        1335837600,
        16.0
      ],
      [
        1335844800,
        17.0
      ],
      [
        1335852000,
        16.0
      ]
    ]
  }
}

GET /{cluster_id}/metrics/{node_ip}/{ks_name}/{cf_name}/{metric}¶

Retrieve metric data about a column family on a single node.

Path arguments:	cluster_id – A Cluster Config ID. node_ip – IP address of the target Node. ks_name – The keyspace that contains the column family to be measured. cf_name – The column family to be measured. metric – One of the Column Family Metrics Keys.
Query params:	parameters – The parameters listed in Controlling the Metric Data Output.

Example

Get the daily, maximum response time (in microseconds) to write requests on the Users column family in the Keyspace1 keyspace by cluster node 10.11.12.150 from May 1, 2012 at 00:00:00 to May 5, 2012 00:00:00 GMT.

curl -G
http://127.0.0.1:8888/Test_Cluster/metrics/10.11.12.150/Keyspace1/Users/cf-write-latency-op
  -d 'function=max'
  -d 'start=1335830400'
  -d 'end=1336176000'
  -d 'step=1440'

Output:

{
  "OpsCenter.rollups60": {
    "MAX": [
      [
        1335830400,
        102.28681945800781
      ],
      [
        1335916800,
        124.86614227294922
      ],
      [
        1336003200,
        null
      ],
      [
        1336089600,
        127.14733123779297
      ]
    ]
  }
}

GET /{cluster_id}/cluster-metrics/{dc}/{metric}¶

Generate a forecast for a metric aggregated across the cluster

Path arguments:	cluster_id – A Cluster Config ID. dc – The name of the data center for the nodes. Use the name `all` to aggregate a metric across all data centers. metric – One of the Cluster Metrics Keys.
Query params:	parameters – The parameters listed in Controlling the Metric Data Output.

Example

Forecast the average write requests per second over for the cluster over all data centers starting from the current time to 4 weeks in the future. Show data points at 1 day (1440-minute) intervals.

curl -G
  http://127.0.0.1:8888/Test_Cluster/cluster-metrics/all/write-ops
    -d "step=1440"
    -d "start=`date +'%s'`"
    -d "end=`date -v+4w +'%s'`"
    -d "forecast=1"

Output:

Data points at 1 day (86400 seconds) intervals show the forecasted number of write requests per day for the next 4 weeks. The results will include the data used to generate the forecast. In this example the forecast is based on 12 weeks of data, so the results begin 12 weeks in the past.

{
    "Total": {
        "AVERAGE": [
            [
                1376006400,
                172.18471918718131
            ],
            [
                1376092800,
                182.06741811718813
            ],
            [
                1376179200,
                159.14967219176917
            ],
            ...
            [
                1385769600,
                202.93040370941162
            ],
            [
                1385856000,
                202.78100836277008
            ],
            [
                1385942400,
                202.59301888942719
            ]
        ]
}

GET /{cluster_id}/new-metrics¶

Retrieve metric data for a single node.

Path arguments:

cluster_id – A Cluster Config ID.

Query params:

nodes – A comma separated list of nodes to fetch data for. Either this or node_group must be specified.
node_group – A convenient way of specifying a group of nodes to retrieve data for. Can be ‘all’ for all nodes, or the name of a datacenter for the nodes in that datacenter. Either this or nodes must be specified.
metrics – A comma separated list of the Metrics Attribute Key Lists to fetch data for. When fetching multiple metrics, all metrics will be fetched using the same nodes, start, end, etc parameters.
columnfamilies – A comma separated list of ‘<keyspace>.columnfamily’ strings indicating the column families to fetch the given metrics for. Required when fetching metrics that are specific to a certain column family.
devices – A comma separated list of device strings indicating the devices to fetch the given metrics for. Required when fetching metrics that are specific to a certain disk or network device.
node_aggregation – Indicates whether or not to aggregate the results across nodes. A ‘0’ value indicates false and a ‘1’ value indicates true.
parameters (additional) – The parameters listed in Controlling the Metric Data Output.

Returns metric data.

Examples

Get the daily average data load on cluster nodes 10.11.12.150, 10.11.12.151 from April 20, 2012 00:00:00 to April 26, 2012 00:00:00 GMT:

curl -G
http://127.0.0.1:8888/Test_Cluster/new-metrics
  -d 'metrics=data-load'
  -d 'nodes=10.11.12.150,10.11.12.151'
  -d 'step=86400'
  -d 'start=1334880000'
  -d 'end=1335398400'

Output:

{
  "metrics": ["data-load"],
  "bounds": {"start": 1334880000, "end": 1335312000, "step": 86400},
  "aggregation_function": null,
  "nodes": ["10.11.12.150", "10.11.12.151"],
  "data": {
    "10.11.12.150": [
      {"metric": "data-load",
       "data-points":
          [
              [4353770496.0, 4353770496.0, 4353770496.0],
              [6353770496.0, 6353770496.0, 6353770496.0],
              [6560092672.0, 6560092672.0, 6560092672.0],
              [6019291136.0, 6019291136.0, 6019291136.0],
              [6149050880.0, 6149050880.0, 6149050880.0],
              [6271239680.0, 6271239680.0, 6271239680.0]
          ]
      }
    ],
    "10.11.12.151": [
      {"metric": "data-load",
       "data-points":
          [
              [4353770496.0, 4353770496.0, 4353770496.0],
              [6353770496.0, 6353770496.0, 6353770496.0],
              [6560092672.0, 6560092672.0, 6560092672.0],
              [6019291136.0, 6019291136.0, 6019291136.0],
              [6149050880.0, 6149050880.0, 6149050880.0],
              [6271239680.0, 6271239680.0, 6271239680.0]
          ]
      }
    ]
  }
}

Get the cluster average for data load and write ops from April 20, 2012 00:00:00 to April 2, 2012 00:00:00 GMT:

curl -G
http://127.0.0.1:8888/Test_Cluster/new-metrics
  -d 'metrics=data-load,write-latency-op'
  -d 'node_group=all'
  -d 'step=86400'
  -d 'start=1334880000'
  -d 'end=1335398400'
  -d 'node_aggregation=1'

Output:

{
  "metrics": ["data-load", "write-latency-op"],
  "bounds": {"start": 1334880000, "end": 1335312000, "step": 86400},
  "aggregation_function": {
    "data-load": "sum",
    "write-latency-op": "average"
  }
  "nodes": ["10..11.12.150", "10.11.12.151"],
  "data": {
    "aggregate": [
      {"metric": "data-load",
       "data-points":
          [
              [4353770496.0, 4353770496.0, 4353770496.0],
              [6353770496.0, 6353770496.0, 6353770496.0],
              [6560092672.0, 6560092672.0, 6560092672.0],
              [6019291136.0, 6019291136.0, 6019291136.0],
              [6149050880.0, 6149050880.0, 6149050880.0],
              [6271239680.0, 6271239680.0, 6271239680.0]
          ]
      },
      {"metric": "write-latency-op",
       "data-points":
          [
              [4353770496.0, 4353770496.0, 4353770496.0],
              [6353770496.0, 6353770496.0, 6353770496.0],
              [6560092672.0, 6560092672.0, 6560092672.0],
              [6019291136.0, 6019291136.0, 6019291136.0],
              [6149050880.0, 6149050880.0, 6149050880.0],
              [6271239680.0, 6271239680.0, 6271239680.0]
          ]
      },
    ]
  }
}

Get the write-ops for multipe cfs for all nodes from April 20, 2012 00:00:00 to April 2, 2012 00:00:00 GMT:

curl -G
http://127.0.0.1:8888/Test_Cluster/new-metrics
  -d 'metrics=cf-write-ops'
  -d 'node_group=all'
  -d 'columnfamiies=OpsCenter.events,OpsCenter.settings'
  -d 'step=86400'
  -d 'start=1334880000'
  -d 'end=1335398400'

Output:

{
  "metrics": ['cf-write-ops'],
  "bounds": {"start": 1334880000, "end": 1335312000, "step": 86400},
  "aggregation_function": null
  "nodes": ["10.11.12.150", "10.11.12.151"],
  "columnfamilies": ["OpsCenter.events", "OpsCenter.settings"],
  "data": {
    "10.11.12.150": [
      {"metric": "cf-write-ops",
       "columnfamily": "OpsCenter.events",
       "data-points":
          [
              [4353770496.0, 4353770496.0, 4353770496.0],
              [6353770496.0, 6353770496.0, 6353770496.0],
              [6560092672.0, 6560092672.0, 6560092672.0],
              [6019291136.0, 6019291136.0, 6019291136.0],
              [6149050880.0, 6149050880.0, 6149050880.0],
              [6271239680.0, 6271239680.0, 6271239680.0]
          ]
      },
    ],
    "10.11.12.151": [
      {"metric": "cf-write-ops",
       "columnfamily": "OpsCenter.settings",
       "data-points":
          [
              [4353770496.0, 4353770496.0, 4353770496.0],
              [6353770496.0, 6353770496.0, 6353770496.0],
              [6560092672.0, 6560092672.0, 6560092672.0],
              [6019291136.0, 6019291136.0, 6019291136.0],
              [6149050880.0, 6149050880.0, 6149050880.0],
              [6271239680.0, 6271239680.0, 6271239680.0]
          ]
      },
    ]
  }
}

Metrics Attribute Key Lists¶

This section contains these tables of metric keys to use with resources that retrieve OpsCenter performance data:

Cluster Metrics Keys
Thread Pool Metrics Keys
Column Family Metrics Keys
Operating System Metrics Keys

Cluster Metrics Keys¶

This list of keys corresponds to Cassandra metrics collected by OpsCenter:

Key	Units	Description
data-load	bytes	Size of the data on the node.
pending-compaction-tasks	–	Number of compaction operations queued and waiting to run.
pending-flush-sorter-tasks	–	Number of pending tasks related to the first step in flushing memtables to disk as SSTables.
read-latency-op	microseconds	Average response time to a client read request.
read-ops	–	The number of read requests per second.
write-latency-op	microseconds	The average response time to a client write request.
write-ops	–	The write requests per second.
key-cache-hits	–	The number of key cache hits per second. (This metric is per-column family before Cassandra 1.1)
key-cache-requests	–	The number of key cache requests per second. (This metric is per-column family before Cassandra 1.1)
key-cache-hit-rate	%	The percentage of key cache lookups that resulted in a hit. (This metric is per-column family before Cassandra 1.1)
row-cache-hits	–	The number of row cache hits per second. (This metric is per-column family before Cassandra 1.1)
row-cache-requests	–	The number of row cache requests per second. (This metric is per-column family before Cassandra 1.1)
row-cache-hit-rate	%	The percentage of row cache lookups that resulted in a hit. (This metric is per-column family before Cassandra 1.1)
total-compactions-completed	–	Number of compaction tasks completed.
total-bytes-compacted	bytes	Number of bytes compacted per second.
g1-old-collection-count	–	Number of G1 old generation garbage collections performed per second.
g1-old-collection-time	ms/sec	Average number of milliseconds spent performing G1 old generation garbage collections per second.
g1-young-collection-count	–	Number of G1 young generation garbage collections performed per second.
g1-young-collection-time	ms/sec	Average number of milliseconds spent performing G1 young generation garbage collections per second.
cms-collection-count	–	Number of concurrent mark sweep garbage collections performed per second.
cms-collection-time	ms/sec	Average number of milliseconds spent performing CMS garbage collections per second.
par-new-collection-count	–	Number of ParNew garbage collections performed per second.
par-new-collection-time	ms/sec	Average number of milliseconds spent performing ParNew garbage collections per second.
heap-committed	bytes	Allocated memory guaranteed for the Java heap.
heap-max	bytes	Maximum amount that the Java heap can grow.
heap-used	bytes	Average amount of Java heap memory used.
nonheap-committed	bytes	Allocated memory, guaranteed for Java nonheap.
nonheap-max	bytes	Maximum amount that the Java nonheap can grow.
nonheap-used	bytes	Average amount of Java nonheap memory used.

Thread Pool Metrics Keys¶

This list of keys corresponds to thread pool metrics collected by OpsCenter:

Key	Description
pending-flushes	Number of memtables queued for the flush process.
pending-gossip-stage	Number of gossip messages and acknowledgments queued and waiting to be sent or received.
pending-hinted-handoff	Number of hints in the queue waiting to be delivered after a failed node comes up.
pending-internal-response-stage	Number of pending tasks from internal tasks, such as nodes joining and leaving the cluster.
pending-memtable-post-flush	Number of pending tasks related to the last step in flushing memtables to disk as SSTables.
pending-migration-stage	Number of pending tasks from system methods that modified the schema.
pending-misc-stage	Number of pending tasks from infrequently run operations, not measured by another metric.
pending-read-stage	Number of read requests received by the cluster and waiting to be handled.
pending-read-repair-stage	Number of read repair operations in the queue waiting to run.
pending-anti-entropy-stage	Manual repair tasks pending, operations to be completed during anti-entropy repair of a node.
pending-repl-on-write-tasks	Pending tasks related replication of data after an insert or update to a row.
pending-request-response-stage	Progress of streamed rows from the receiving node.
pending-mutation-stage	Number of write requests received by the cluster and waiting to be handled.
pending-counter-mutation-stage	Number of counter mutation requests received by the cluster and waiting to be handled.
pending-memory-meter	Pending tasks that will calculate the live ratio which is used to estimate memtable size. The live ratio is the actual memory usage of a memtable including JVM overhead as it compares it to the raw data size. There will be at most one pending task per table.
pending-validation-executor	Pending task to read data from sstables and generate a merkle tree for a repair.
pending-commitlog-archiver	Commitlog archiver pending tasks
pending-compaction-executor	Pending compactions that are known. This may deviate from “pending compactions” which includes an estimate of tasks that these pending tasks may create after completion.
pending-pending-range-calculator	Pending tasks to calculate the ranges according to bootsrapping and leaving nodes.
pending-native-transport-requests	Native Transport Requests Requests Pending
active-flushes	Number of memtables being flushed flush process. A flush sorts and writes the memtables to disk, which could block writes.
active-gossip-stage	Number of gossip messages and acknowledgments actively being sent or received.
active-hinted-handoff	Number of hints actively being delivered after a failed node comes up.
active-internal-response-stage	Number of active tasks from internal tasks, such as nodes joining and leaving the cluster.
active-anti-entropy-stage	Repair tasks active, such as handling the merkle tree transfer after the validation compaction.
active-memtable-post-flush	Number of active tasks related to the last step in flushing memtables to disk as SSTables. Includes removing unnecessary commitlog files and committing Solr-based secondary indexes.
active-migration-stage	Number of active tasks from system methods that modified the schema.
active-misc-stage	Number of active tasks from infrequently run operations, such as taking a snapshot or processing the notification of a completed replication.
active-read-stage	Number of active read requests. Read requests read data off of disk and deserialize cached data.
active-read-repair-stage	Number of read repair operations actively being run.
active-repl-on-write-tasks	Number of active counter increment tasks that will read then write on the replicas after a coordinator’s local write. Depending on consistency level used on writes, tasks may back up outside of the normal write path.
active-request-response-stage	Number of callbacks to being executed after a task on a remote node is completed.
active-mutation-stage	Number of write requests received by the cluster and being handled.
active-counter-mutation-stage	Number of counter mutation requests received by the cluster and being handled.
active-memory-meter	Active tasks that calculate the live ratio which is used to estimate memtable size. The live ratio is the actual memory usage of a memtable including JVM overhead as it compares it to the raw data size. There will be at most one pending task per table.
active-validation-executor	Active task to read data from sstables and generate a merkle tree for a repair.
active-commitlog-archiver	Commitlog archiver active tasks
active-compaction-executor	Active compactions that are known.
active-pending-range-calculator	Active tasks to calculate the ranges according to bootsrapping and leaving nodes.
active-native-transport-requests	Native Transport Requests Requests Active
completed-flushes	Number of memtables recently flushed. A flush sorts and writes the memtables to disk, which could block writes.
completed-gossip-stage	Number of gossip messages and acknowledgments recently sent or received.
completed-hinted-handoff	Number of hints recently delivered after a failed node comes up.
completed-internal-response-stage	Number of recently completed tasks from internal tasks, such as nodes joining and leaving the cluster.
completed-anti-entropy-stage	Repair tasks recently completed, such as handling the merkle tree transfer after the validation compaction.
completed-memtable-post-flush	Number of completed tasks related to the last step in flushing memtables to disk as SSTables. Includes removing unnecessary commitlog files and committing Solr-based secondary indexes.
completed-migration-stage	Number of completed tasks from system methods that modified the schema.
completed-misc-stage	Number of completed tasks from infrequently run operations, such as taking a snapshot or processing the notification of a completed replication.
completed-read-stage	Number of completed read requests. Read requests read data off of disk and deserialize cached data.
completed-read-repair-stage	Number of read repair operations recently completed.
completed-repl-on-write-tasks	Number of completed counter increment tasks that read then write on the replicas after a coordinator’s local write. Depending on consistency level used on writes, tasks may back up outside of the normal write path.
completed-request-response-stage	Number of completed callbacks executed after a task on a remote node is completed.
completed-mutation-stage	Number of write requests received by the cluster that have been handled.
completed-counter-mutation-stage	Number of counter mutation requests received by the cluster that have been handled.
completed-memory-meter	Completed tasks that calculate the live ratio which is used to estimate memtable size. The live ratio is the actual memory usage of a memtable including JVM overhead as it compares it to the raw data size. There will be at most one pending task per table.
completed-validation-executor	Completed tasks to read data from sstables and generate a merkle tree for a repair.
completed-commitlog-archiver	Commitlog archiver completed tasks
completed-compaction-executor	Completed compactions
completed-pending-range-calculator	Completed tasks to calculate the ranges according to bootsrapping and leaving nodes.
completed-native-transport-requests	Native Transport Requests Requests Completed
blocked-flushes	Number of memtables flush process blocked. A flush sorts and writes the memtables to disk, which could block writes.
blocked-gossip-stage	Number of gossip messages and acknowledgments blocked waiting to be sent or received.
blocked-hinted-handoff	Number of hints blocked waiting to be delivered after a failed node comes up.
blocked-internal-response-stage	Number of blocked tasks from internal tasks, such as nodes joining and leaving the cluster.
blocked-anti-entropy-stage	Repair tasks blocked, such as handling the merkle tree transfer after the validation compaction.
blocked-memtable-post-flush	Number of blocked tasks related to the last step in flushing memtables to disk as SSTables. Includes removing unnecessary commitlog files and committing Solr-based secondary indexes.
blocked-migration-stage	Number of blocked tasks from system methods that modified the schema.
blocked-misc-stage	Number of blocked tasks from infrequently run operations, such as taking a snapshot or processing the notification of a completed replication.
blocked-read-stage	Number of blocked read requests. Read requests read data off of disk and deserialize cached data.
blocked-read-repair-stage	Number of read repair operations blocked waiting to run.
blocked-repl-on-write-tasks	Number of blocked counter increment tasks that will read then write on the replicas after a coordinator’s local write. Depending on consistency level used on writes, tasks may back up outside of the normal write path.
blocked-request-response-stage	Number of blocked callbacks to be executed after a task on a remote node is completed.
blocked-mutation-stage	Number of write requests received by the cluster and blocked waiting to be handled.
blocked-counter-mutation-stage	Number of counter mutation requests received by the cluster that are blocked and waiting to be handled.
blocked-memory-meter	Blocked tasks that calculate the live ratio which is used to estimate memtable size. The live ratio is the actual memory usage of a memtable including JVM overhead as it compares it to the raw data size.
blocked-validation-executor	Blocked tasks to read data from sstables and generate a merkle tree for a repair.
blocked-commitlog-archiver	Commitlog archiver blocked tasks
blocked-compaction-executor	Blocked compactions
blocked-pending-range-calculator	Blocked tasks to calculate the ranges according to bootsrapping and leaving nodes.
blocked-native-transport-requests	Native transport requests requests blocked
total-blocked-flushes	Total number of memtables flush process blocked. A flush sorts and writes the memtables to disk, which could block writes.
total-blocked-gossip-stage	Total number of gossip messages and acknowledgments blocked waiting to be sent or received.
total-blocked-hinted-handoff	Total number of hints blocked waiting to be delivered after a failed node comes up.
total-blocked-internal-response-stage	Total number of blocked tasks from internal tasks, such as nodes joining and leaving the cluster.
total-blocked-anti-entropy-stage	Total repair tasks blocked, such as handling the merkle tree transfer after the validation compaction.
total-blocked-memtable-post-flush	Total number of blocked tasks related to the last step in flushing memtables to disk as SSTables. Includes removing unnecessary commitlog files and committing Solr-based secondary indexes.
total-blocked-migration-stage	Total number of blocked tasks from system methods that modified the schema.
total-blocked-misc-stage	Total number of blocked tasks from infrequently run operations, such as taking a snapshot or processing the notification of a completed replication.
total-blocked-read-stage	Total number of blocked read requests. Read requests read data off of disk and deserialize cached data.
total-blocked-read-repair-stage	Total number of read repair operations blocked waiting to run.
total-blocked-repl-on-write-tasks	Total number of blocked counter increment tasks that will read then write on the replicas after a coordinator’s local write. Depending on consistency level used on writes, tasks may back up outside of the normal write path.
total-blocked-request-response-stage	Total number of blocked callbacks to be executed after a task on a remote node is completed.
total-blocked-mutation-stage	Total number of write requests received by the cluster and blocked waiting to be handled.
total-blocked-counter-mutation-stage	Total number of counter mutation requests received by the cluster that are blocked and waiting to be handled.
total-blocked-memory-meter	Total blocked tasks that calculate the live ratio which is used to estimate memtable size. The live ratio is the actual memory usage of a memtable including JVM overhead as it compares it to the raw data size.
total-blocked-validation-executor	Total blocked tasks to read data from sstables and generate a merkle tree for a repair.
total-blocked-commitlog-archiver	Commitlog archiver total blocked tasks
total-blocked-compaction-executor	Total blocked compactions
total-blocked-pending-range-calculator	Total blocked tasks to calculate the ranges according to bootsrapping and leaving nodes.
total-blocked-native-transport-requests	Native transport requests requests total blocked

Column Family Metrics Keys¶

This list of keys corresponds to column family-specific metrics collected by OpsCenter:

Key	Units	Description
cf-keycache-hit-rate	%	Cache requests that resulted in a key cache hit. (This metric is global in Cassandra 1.1+.)
cf-keycache-hits	–	Number of read requests that resulted in the requested row key being found in the key cache. (This metric is global in Cassandra 1.1+.)
cf-keycache-requests	–	Total number of read requests on the key cache. (This metric is global in Cassandra 1.1+.)
cf-live-disk-used	bytes	Disk space used by a column family for readable data.
cf-live-sstables	–	Current number of SSTables for a column family.
cf-pending-tasks	–	Number of pending reads and writes on a column family.
cf-read-latency-op	microseconds	Internal response time to a successful request to read data from a column family.
cf-read-ops	–	Read requests per second on a column family.
cf-rowcache-hit-rate	–	Percentage of cache requests that resulted in a row cache hit. (This metric is global in Cassandra 1.1+.)
cf-rowcache-hits	–	Number of read requests on the row cache. (This metric is global in Cassandra 1.1+.)
cf-rowcache-requests	–	Total number of read requests on the row cache. (This metric is global in Cassandra 1.1+.)
cf-total-disk-used	–	Disk space used by a column family for live or old data (not live).
cf-write-latency-op	microseconds	Internal response time to a successful request to write data to a column family.
cf-write-ops	–	Write requests per second on a column family.
cf-bf-space-used	bytes	How large the bloom filter is.
cf-bf-false-positives	–	Number of bloom filter false positives per second.
cf-bf-false-ratio	%	Percentage of bloom filter lookups that resulted in a false positive.
cf-column-count	–	Histogram of cells per partition
cf-partition-size	bytes	Histogram of the size of partitions
solr-avg-time-per-req	milliseconds	Average time a search query takes in a DSE cluster using DSE search.
solr-errors	–	Errors per second that occur for a specific Solr core/index.
solr-requests	–	Requests per second made to a specific Solr core/index.
solr-timeouts	–	Timeouts per second on a specific Solr core/index.

Operating System Metrics Keys¶

This list of keys corresponds to operating system (OS) metrics collected by OpsCenter:

Key	OS	Units	Description
os-cpu-idle	all*	%	Time the CPU is idle.
os-cpu-iowait	Linux	%	Time the CPU devotes to waiting for I/O to complete.
os-cpu-nice	Linux	%	Time the CPU devotes to processing nice tasks.
os-cpu-privileged	Windows	%	Time the CPU devotes to processing privileged instructions.
os-cpu-steal	Linux	%	Time the CPU devotes to tasks stolen by virtual operating systems.
os-cpu-system	Linux, OSX	%	Time the CPU devotes to system processes.
os-cpu-user	all*	%	Time the CPU devotes to user processes.
os-disk-await	Linux, Windows	MS	Average completion time of each request to the disk.
os-disk-free	all*	GB	Free space on a specific disk partition.
os-disk-queue-size	Linux, Windows	–	Average number of requests queued due to disk latency issues.
os-disk-read-rate	Linux, Windows	–	Rate of reads per second to the disk.
os-disk-read-throughput	Linux, Windows	mb/sec	Average disk throughput for read operations.
os-disk-request-size	Linux	sectors	Average size of read requests issued to the disk.
os-disk-request-size-kb	Windows	KB	Average size of read requests issued to the disk.
os-disk-throughput	OSX	mb/sec	Average disk throughput for read and write operations.
os-disk-usage	all*	%	Disk space used by Cassandra at a given time.
os-disk-used	all*	GB	Disk space used by Cassandra at a given time.
os-disk-utilization	Linux, Windows	%	CPU time consumed by disk I/O.
os-disk-write-rate	Linux, Windows	–	Rate of writes per second to the disk.
os-disk-write-throughput	Linux, Windows	mb/sec	Average disk throughput for write operations.
os-load	all*	–	Operating system load average
os-memory-avail	Windows	MB	Available physical memory.
os-memory-buffers	Linux	MB	Total system memory currently buffered.
os-memory-cached	Linux	MB	Total system memory currently cached.
os-memory-committed	Windows	MB	Memory in use by the operating system.
os-memory-free	Linux, OSX	MB	Total system memory currently free.
os-memory-pool-nonpaged	Windows	MB	Allocated pool-nonpaged memory.
os-memory-pool-paged	Windows	MB	Allocated pool-paged-resident memory.
os-memory-sys-cache-resident	Windows	MB	Memory used by the file cache.
os-memory-used	Linux, OSX	MB	Total system memory currently used.
os-net-received	all*	kb/sec	Speed of data received from the network.
os-net-sent	all*	kb/sec	Speed of data sent across the network.

all means Linux, OSX, and Windows operating systems.

Table Of Contents

Previous topic

Next topic

This Page

Retrieving Metric Data¶

Controlling the Metric Data Output¶

Metrics Attribute Key Lists¶

Cluster Metrics Keys¶

Thread Pool Metrics Keys¶

Column Family Metrics Keys¶

Operating System Metrics Keys¶