Monitor streaming tenants

Because Astra Streaming is a managed SaaS offering, some Apache Pulsar™ metrics aren’t exposed for external integration purposes. At a high level, Astra Streaming only exposes metrics related to namespaces. Metrics that are not directly related to namespaces aren’t exposed externally, such as the Bookkeeper ledger and journal metrics and Zookeeper metrics.

Additionally, of the exposed metrics, not all metrics are recommended for external integration.

Pulsar raw metrics

For a complete Pulsar metrics reference, see:

For a complete Astra Streaming metrics reference, see Grafana dashboards for Astra Streaming metrics.

Astra Streaming metrics

Namespace and topic metrics

Astra Streaming exposes both namespace and topic level metrics. Namespace metrics can always be inferred from corresponding topic metrics via metrics aggregation.

The following table lists recommended namespace and/or topic metrics as a starting point.

Metrics Name	Namespace and/or Topic Level	Metrics Type	Note
pulsar_topics_count	Namespace	Gauge	The number of Pulsar topics of a namespace.
pulsar_producers_count	Topic	Gauge	The number of active producers of a topic.
pulsar_consumers_count	Topic	Gauge	The number of active consumers of a topic.
pulsar_subscriptions_count	Topic	Gauge	The number of Pulsar subscriptions of a topic.
pulsar_rate_in	Topic	Gauge	The total message rate (message per second) coming into a topic.
pulsar_rate_out	Topic	Gauge	The total message rate (message per second) coming out of a topic.
pulsar_throughput_in	Topic	Gauge	The total throughput (byte per second) coming into a topic.
pulsar_throughput_out	Topic	Gauge	The total throughput (byte per second) coming out of a topic.
pulsar_msg_backlog	Topic	Gauge	The total number of message backlog of a topic.
pulsar_storage_size	Topic	Gauge	The total storage size (in bytes) of a topic.
pulsar_storage_backlog_size	Topic	Gauge	The total backlog size (in bytes) of a topic.
pulsar_storage_offloaded_size	Topic	Gauge	The total amount of the data (in bytes) of a topic offloaded to the tiered storage.
pulsar_in_bytes_total	Topic	Counter	The total number of messages (in bytes) received for a topic.
pulsar_out_bytes_total	Topic	Counter	The total number of messages (in bytes) read from a topic.
pulsar_in_messages_total	Topic	Counter	The total number of messages received for a topic.
pulsar_out_messages_total	Topic	Counter	The total number of messages read from a topic.

Metrics Name

Namespace and/or Topic Level

Metrics Type

Note

pulsar_topics_count

Namespace

Gauge

The number of Pulsar topics of a namespace.

pulsar_producers_count

Topic

Gauge

The number of active producers of a topic.

pulsar_consumers_count

Topic

Gauge

The number of active consumers of a topic.

pulsar_subscriptions_count

Topic

Gauge

The number of Pulsar subscriptions of a topic.

pulsar_rate_in

Topic

Gauge

The total message rate (message per second) coming into a topic.

pulsar_rate_out

Topic

Gauge

The total message rate (message per second) coming out of a topic.

pulsar_throughput_in

Topic

Gauge

The total throughput (byte per second) coming into a topic.

pulsar_throughput_out

Topic

Gauge

The total throughput (byte per second) coming out of a topic.

pulsar_msg_backlog

Topic

Gauge

The total number of message backlog of a topic.

pulsar_storage_size

Topic

Gauge

The total storage size (in bytes) of a topic.

pulsar_storage_backlog_size

Topic

Gauge

The total backlog size (in bytes) of a topic.

pulsar_storage_offloaded_size

Topic

Gauge

The total amount of the data (in bytes) of a topic offloaded to the tiered storage.

pulsar_in_bytes_total

Topic

Counter

The total number of messages (in bytes) received for a topic.

pulsar_out_bytes_total

Topic

Counter

The total number of messages (in bytes) read from a topic.

pulsar_in_messages_total

Topic

Counter

The total number of messages received for a topic.

pulsar_out_messages_total

Topic

Counter

The total number of messages read from a topic.

Replication metrics

When geo-replication is enabled for a particular namespace, a subset of namespace metrics is available specifically for geo-replication purposes. Below is the list of recommended geo-replication metrics as a starting point.

Metrics Name	Namespace and/or Topic Level	Metrics Type	Note
pulsar_replication_rate_in	Namespace	Gauge	The total message rate (message per second) of the namespace replicating from a remote cluster.
pulsar_replication_rate_out	Namespace	Gauge	The total message rate (message per second) of the namespace replicating to a remote cluster.
pulsar_replication_throughput_in	Namespace	Gauge	The total throughput (bytes per second) of the namespace replicating from a remote cluster.
pulsar_replication_throughput_out	Namespace	Gauge	The total throughput (bytes per second) of the namespace replicating to a remote cluster.
pulsar_replication_backlog	Namespace	Gauge	The total message backlog of the namespace replicating to a remote cluster.

Metrics Name

Namespace and/or Topic Level

Metrics Type

Note

pulsar_replication_rate_in

Namespace

Gauge

The total message rate (message per second) of the namespace replicating from a remote cluster.

pulsar_replication_rate_out

Namespace

Gauge

The total message rate (message per second) of the namespace replicating to a remote cluster.

pulsar_replication_throughput_in

Namespace

Gauge

The total throughput (bytes per second) of the namespace replicating from a remote cluster.

pulsar_replication_throughput_out

Namespace

Gauge

The total throughput (bytes per second) of the namespace replicating to a remote cluster.

pulsar_replication_backlog

Namespace

Gauge

The total message backlog of the namespace replicating to a remote cluster.

Subscription metrics

The following table gives the list of recommended subscription metrics as a starting point.

Metrics Name	Metrics Type	Note
pulsar_subscription_back_log	Gauge	The total backlog (number of messages) for a subscription of a topic.
pulsar_subscription_delayed	Gauge	The total number of messages of a subscription that are delayed to be dispatched for a subscription of a topic.
pulsar_subscription_msg_rate_redeliver	Gauge	The total message rate (message per second) being redelivered for a subscription of a topic.
pulsar_subscription_unacked_messages	Gauge	The total number of unacknowledged messages for a subscription of a topic.
pulsar_subscription_blocked_on_unacked_messages	Gauge	Binary indicator (1 or 0) of whether a subscription of a topic is blocked on unacknowledged messages or not.
pulsar_subscription_msg_rate_out	Gauge	The total message dispatch rate (message per second) for a subscription of a topic.
pulsar_subscription_msg_throughput_out	Gauge	The total message dispatch throughput (bytes per second) for a subscription of a topic.
pulsar_subscription_msg_ack_rate	Gauge	The total message acknowledgment rate (message per second) for a subscription of a topic.
pulsar_subscription_msg_rate_expired	Gauge	The total rate of messages (message per second) expired on a subscription of a topic.
pulsar_subscription_total_msg_expired	Gauge	The total number of messages expired on a subscription of a topic.
pulsar_subscription_msg_drop_rate	Gauge	The rate of messages (message per second) dropped on a subscription of a topic.
pulsar_subscription_consumers_count	Gauge	The number of connected consumers on a subscription of a topic.

Metrics Name

Metrics Type

Note

pulsar_subscription_back_log

Gauge

The total backlog (number of messages) for a subscription of a topic.

pulsar_subscription_delayed

Gauge

The total number of messages of a subscription that are delayed to be dispatched for a subscription of a topic.

pulsar_subscription_msg_rate_redeliver

Gauge

The total message rate (message per second) being redelivered for a subscription of a topic.

pulsar_subscription_unacked_messages

Gauge

The total number of unacknowledged messages for a subscription of a topic.

pulsar_subscription_blocked_on_unacked_messages

Gauge

Binary indicator (1 or 0) of whether a subscription of a topic is blocked on unacknowledged messages or not.

pulsar_subscription_msg_rate_out

Gauge

The total message dispatch rate (message per second) for a subscription of a topic.

pulsar_subscription_msg_throughput_out

Gauge

The total message dispatch throughput (bytes per second) for a subscription of a topic.

pulsar_subscription_msg_ack_rate

Gauge

The total message acknowledgment rate (message per second) for a subscription of a topic.

pulsar_subscription_msg_rate_expired

Gauge

The total rate of messages (message per second) expired on a subscription of a topic.

pulsar_subscription_total_msg_expired

Gauge

The total number of messages expired on a subscription of a topic.

pulsar_subscription_msg_drop_rate

Gauge

The rate of messages (message per second) dropped on a subscription of a topic.

pulsar_subscription_consumers_count

Gauge

The number of connected consumers on a subscription of a topic.

Function metrics

The following table gives the list of recommended function metrics as a starting point. This is only relevant when Pulsar functions are deployed in Astra Streaming.

Metrics Name	Metrics Type	Note
pulsar_function_processed_successfully_total	Counter	The total number of messages processed successfully by a function.
pulsar_function_received_total	Counter	The total number of messages a function receives.
pulsar_function_process_latency_ms	Summary	The process latency (in milliseconds) of a function.

Metrics Name

Metrics Type

Note

pulsar_function_processed_successfully_total

Counter

The total number of messages processed successfully by a function.

pulsar_function_received_total

Counter

The total number of messages a function receives.

pulsar_function_process_latency_ms

Summary

The process latency (in milliseconds) of a function.

Source connector metrics

The following table gives the list of recommended source connector metrics as a starting point. This is only relevant when Pulsar source connectors are deployed in Astra Streaming.

Metrics Name	Metrics Type	Note
pulsar_source_written_total	Counter	The total number of messages processed by a source connector.
pulsar_source_received_total	Counter	The total number of messages received by a source connector.

Metrics Name

Metrics Type

Note

pulsar_source_written_total

Counter

The total number of messages processed by a source connector.

pulsar_source_received_total

Counter

The total number of messages received by a source connector.

Sink connector metrics

The following table gives the list of recommended source connector metrics as a starting point. This is only relevant when Pulsar sink connectors are deployed in Astra Streaming.

Metrics Name	Metrics Type	Note
pulsar_sink_written_total	Counter	The total number of messages processed by a sink connector.
pulsar_sink_received_total	Counter	The total number of messages received by a sink connector.

Metrics Name

Metrics Type

Note

pulsar_sink_written_total

Counter

The total number of messages processed by a sink connector.

pulsar_sink_received_total

Counter

The total number of messages received by a sink connector.

Aggregate Astra Streaming metrics

Do not aggregate metrics on shared clusters because one cluster can be shared among multiple organizations. For more information, see Astra Streaming limits and Astra Streaming pricing.

Each externally exposed raw Astra Streaming metric is reported at a very low level, at each individual server instance (the exported_instance label) and each topic partition (the topic label). The same raw metrics could come from multiple server instances. From an Astra Streaming user’s perspective, the direct monitoring of raw metrics is not really useful. Raw metrics need to be aggregated first - for example, by averaging or summing the raw metrics over a period of time.

The following example shows some raw metrics for a Pulsar message backlog (pulsar_msg_backlog) scraped from an Astra Streaming cluster in the Google Cloud us-central1 region:

....
pulsar_msg_backlog{app="pulsar", cluster="pulsar-gcp-uscentral1", component="broker", controller_revision_hash="pulsar-gcp-uscentral1-broker-<hash>f", exported_instance="<ip>:<port>", exported_job="broker", helm_release_name="astraproduction-gcp-pulsar-uscentral1", instance="prometheus-gcp-uscentral1.streaming.datastax.com:443", job="astra-pulsar-metrics-demo", kubernetes_namespace="pulsar", kubernetes_pod_name="pulsar-gcp-uscentral1-broker-3", namespace="demo/testns", prometheus="pulsar/astraproduction-gcp-pulsar-prometheus", prometheus_replica="prometheus-astraproduction-gcp-pulsar-prometheus-0", pulsar_cluster_dns="gcp-uscentral1.streaming.datastax.com", release="astraproduction-gcp-pulsar-uscentral1", statefulset_kubernetes_io_pod_name="pulsar-gcp-uscentral1-broker-3", topic="persistent://demo/testns/raw-partition-0"}
....

To transform raw metrics into a usable state, DataStax recommends the following:

Aggregate metrics at the parent topic level, at minimum, instead of at the partition level. In Pulsar, end user applications only deal with messages at the parent topic level; however, internally, Pulsar handles message processing at the partition level.
Exclude reported metrics that are associated with Astra Streaming’s system namespaces and topics, which are usually prefixed by two underscores, such as:
```
__kafka
__transaction_producer_state
```

PromQL query patterns

PromQL is Prometheus’s simple and powerful query language that you can use to select and aggregate time series data in real time. For more information, see the PromQL documentation.

DataStax recommends the following PromQL query patterns for aggregating raw Astra Streaming metrics. The following examples use the pulsar_msg_backlog raw metric to demonstrate the patterns. In accordance with the recommendations in Aggregate Astra Streaming metrics, the example patterns aggregate messages at the parent topic level or higher and they exclude system topics.

Filter system topics

You can use the following expression to filter system topics:

{topic !~ ".*__.*"}`

This expression excludes messages with topic labels that include two consecutive underscores. This works because Pulsar system topics and namespaces are usually prefixed by two underscores, such as:

persistent://some_tenant/__kafka/__consumer_offsets_partition_0

To use this expression, your applications' namespace and topic names don’t contain double underscores. If they do, they will also be excluded by this filter.

Get the total message backlog of a specific parent topic, excluding system topics

$ptopic is a Grafana dashboard variable that represents a specific parent topic.

sum(pulsar_msg_backlog{topic=~$ptopic, topic !~ ".*__.*"})

Get the total message backlog of a specific namespace, excluding system topics

$namespace is a Grafana dashboard variable that represents a specific namespace.

sum(pulsar_msg_backlog{namespace=~"$namespace", topic !~ ".*__.*"})

Get the total message backlog of a tenant, excluding system topics

$tenant is a (Grafana dashboard) variable that represents a specific tenant.

sum(pulsar_msg_backlog{namespace=~"$tenant.+"", topic !~ ".*__.*"})

Get the total message backlog of each topic within a specific namespace, excluding system topics

sum by(topic) (pulsar_msg_backlog{namespace=~"$namespace", topic !~ ".*__.*"})

Get the top 10 message backlog by topic within a specific namespace, excluding system topics

topk (10, sum by(topic) (pulsar_msg_backlog{namespace=~"$namespace", topic !~ ".*__.*"}))

Metrics alerts

Most of the exposed Astra Streaming metrics reflect generic application workload characteristics, such as message rate or throughput, and they are for informational purposes only.

However, DataStax recommends that you monitor the following metrics for unexpected increases:

Metrics for alerting
Metrics Name	Aggregate	Metrics Type	Note
pulsar_storage_size	Topic	Gauge	The total storage size (in bytes) of a topic.
pulsar_storage_backlog_size	Topic	Gauge	The total backlog size (in bytes) of a topic.
pulsar_replication_backlog	Georeplication	Gauge	The total message backlog of the namespace replicating to a remote cluster.
pulsar_subscription_back_log	Subscription	Gauge	The total backlog (number of messages) for a subscription of a topic.
pulsar_subscription_delayed	Subscription	Gauge	The total number of messages of a subscription that are delayed to be dispatched for a subscription of a topic.
pulsar_subscription_msg_drop_rate	Subscription	Gauge	The rate of messages (message per second) dropped on a subscription of a topic.
pulsar_subscription_unacked_messages	Subscription	Gauge	The total number of unacknowledged messages for a subscription of a topic.

Alerting rules

In a perfect world, these metrics would always be 0. In reality, these metrics will increase when an application’s workload increases, and then return to normal when the workload decreases.

You can set an alert threshold to be notified when these metrics exceed normal capacity, but this can cause false alarms during expected workload spikes.

Alternatively, you can calculate the metrics' increase rate over a period of time, such as one hour, and then set a threshold based on the rate of increase. For example, if the average message backlog increase rate exceeds the given threshold, an alert is triggered.

Thresholds for these metrics depends on your application’s routine workloads and requirements. Generally, these values are large positive numbers, ranging in the several hundreds or several thousands. If your receive too many false alarms, adjust the alert threshold to a higher value.

Monitor streaming tenants

Pulsar raw metrics

Astra Streaming metrics

Namespace and topic metrics

Replication metrics

Subscription metrics

Function metrics

Source connector metrics

Sink connector metrics

Aggregate Astra Streaming metrics

PromQL query patterns

Get the total message backlog of a specific parent topic, excluding system topics

Get the total message backlog of a specific namespace, excluding system topics

Get the total message backlog of a tenant, excluding system topics

Get the total message backlog of each topic within a specific namespace, excluding system topics

Get the top 10 message backlog by topic within a specific namespace, excluding system topics

Metrics alerts

Alerting rules

See also

Was this helpful?

Give Feedback