Export metrics via Astra Portal
Enterprises depend on the ability to view database health metrics in centralized systems along with their other software metrics. The Astra DB Metrics feature lets you forward Astra DB database health metrics to an external third-party metrics system. We refer to the recipient of the exported metrics as the destination system.
Introduction
The functionality provided by the Astra DB Metrics feature is often referred to as:
-
Observability
-
External monitoring
-
Third-party metrics
-
Prometheus monitoring integration
Astra DB supports exporting health metrics from its serverless databases to:
-
Open-source Apache Kafka®
-
Splunk (via DevOps API)
-
Pulsar (via DevOps API)
-
Datadog (via DevOps API)
You can also use Grafana or Grafana Cloud to visualize the exported metrics.
Metrics UI and API options
You can configure the export of Astra DB metrics via Astra Portal (described in this topic), or via the DevOps API.
Using Splunk, Pulsar, or Datadog as the destination for exported Astra DB health metrics is supported via the DevOps API, but not in Astra Portal at this time. See Export metrics via DevOps API. |
Feature availability
The Astra DB Metrics feature:
|
Benefits
The Astra DB Metrics feature allows you to take full control of forwarding Astra DB database health metrics to your preferred observability system. The functionality is intended for developers, site reliability engineers (SREs), IT managers, and product owners.
Ingesting database health metrics into your system gives you the ability to craft your own alerting actions and dashboards based on your service level objectives and retention requirements. While you can continue to view metrics displayed in Astra Portal via each database’s Health tab, forwarding metrics to a third-party app gives you a more complete view of all metrics being tracked, across all your products.
This enhanced capability can provide your team with broader insights into historical performance, issues, and areas for improvement.
The exported Astra DB health metrics are nearly real-time when consumed externally. You can find the source-of-truth view of your metric values in the Astra Portal’s Health dashboard. |
Prerequisites
-
If you haven’t already, create a serverless database using the Astra Portal.
-
Ensure you have an admin permission to view and use the Export Metrics UI, which is under Settings for each database. See Roles and permissions in this topic.
You’ll need an existing destination system to receive the forwarded Astra DB metrics. Supported destinations are Prometheus, Apache Kafka, Confluent Kafka, Amazon CloudWatch, and Grafana / Grafana Cloud. |
Pricing
With an Astra DB PAYG or Enterprise plan, there is no additional cost to using Astra DB Metrics, outside of standard data transfer charges. Exporting third-party metrics is not available on the Astra DB Free Tier.
Metrics monitoring may incur costs at the destination system. Consult the destination system’s documentation for its pricing information.
Roles and permissions
The following Astra DB roles can export third-party metrics:
-
Organization Administrator (recommended)
-
Database Administrator
-
Service Account Administrator
-
User Administrator
The required db-manage-thirdpartymetrics permission is automatically assigned to those roles.
If you create a custom role in Astra DB, be sure to assign db-manage-thirdpartymetrics permission to the custom role.
Database health metrics forwarded by Astra DB
Metrics are not alerts. Astra DB provides the ability to export metrics to an external destination system so that you can devise alerts based on the metrics' values. Also, note that when you use the Astra DB Metrics feature in conjunction with the private link feature, the exported metrics traffic does not make use of the private link connection. Metrics traffic flows over the public interfaces as it would without a private link. |
The Astra DB health metrics forwarded are aggregated values, calculated once every 1 minute. For each metric, a rate of increase over both 1 minute and 5 minutes will be produced. The following database health metrics will be forwarded by the Astra DB Metrics feature:
-
astra_db_rate_limited_requests:rate1m
andastra_db_rate_limited_requests:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for the number of failed operations due to an Astra DB rate limit. You can request that rate limits are increased for your Astra DB databases. Using these rates, alert if the value is > 0. -
astra_db_read_requests_failures:rate1m
andastra_db_read_requests_failures:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for the number of failed reads. Cassandra drivers retry failed operations, but significant failures can be problematic. Using these rates, alert if the value is > 0.Warn
alert on low amount.High
alert on larger amounts; determine potentially as a percentage of read throughput. -
astra_db_read_requests_timeouts:rate1m
andastra_db_read_requests_timeouts:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for read timeouts. Timeouts happen when operations against the database take longer than the server side timeout. Using these rates, alert if the value is > 0. -
astra_db_read_requests_unavailables:rate1m
andastra_db_read_requests_unavailables:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for reads where service is not available to complete a specific request. Using these rates, alert if the value is > 0. -
astra_db_write_requests_failures:rate1m
andastra_db_write_requests_failures:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for the number of failed writes. Cassandra drivers retry failed operations, but significant failures can be problematic. Using these rates, alert if the value is > 0.Warn
alert on low amount.High
alert on larger amounts; determine potentially as a percentage of read throughput. -
astra_db_write_requests_timeouts:rate1m
andastra_db_write_requests_timeouts:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for timeouts, which occur when operations take longer than the server side timeout. Using these rates, compare withwrite_requests_failures
. -
astra_db_write_requests_unavailables:rate1m
andastra_db_write_requests_unavailables:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for unavailable errors, which occur when the service is not available to service a particular request. Using these rates, compare withwrite_requests_failures
. -
astra_db_range_requests_failures:rate1m
andastra_db_range_requests_failures:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for the number of range reads that failed. Cassandra drivers retry failed operations, but significant failures can be problematic. Using these rates, alert if the value is > 0.Warn
alert on low amount.High
alert on larger amounts; determine potentially as a percentage of read throughput. -
astra_db_range_requests_timeouts:rate1m
andastra_db_range_requests_timeouts:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for timeouts, which are a subset of total failures. Use this metric to understand if failures are due to timeouts. Using these rates, compare withrange_requests_failures
. -
astra_db_range_requests_unavailables:rate1m
andastra_db_range_requests_unavailables:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for unavailable errors, which are a subset of total failures. Use this metric to understand if failures are due to timeouts. Using these rates, compare withrange_requests_failures
. -
astra_db_write_latency_seconds:rate1m
andastra_db_write_latency_seconds:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for write throughput. Alert based on your application Service Level Objective (business requirement). -
astra_db_write_latency_seconds_P$QUANTILE:rate1m
andastra_db_write_latency_seconds_P$QUANTILE:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for write latency, where $QUANTILE is a histogram quantile of 99, 95, 90, 75, or 50 (e.g.astra_db_write_latency_seconds_P99:rate1m
). Alert based on your application Service Level Objective (business requirement). -
astra_db_write_requests_mutation_size_bytesP$QUANTILE:rate1m
andastra_db_write_requests_mutation_size_bytesP$QUANTILE:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for how big writes are over time, where $QUANTILE is a histogram quantile of 99, 95, 90, 75, or 50. For example,astra_db_write_requests_mutation_size_bytesP99:rate1m
. -
astra_db_read_latency_seconds:rate1m
andastra_db_read_latency_seconds:rate5m
- Take the rate for read throughput. Alert based on your application Service Level Objective (business requirement). -
astra_db_read_latency_secondsP$QUANTILE:rate1m
andastra_db_read_latency_secondsP$QUANTILE:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for percentiles read for latency, where $QUANTILE is a histogram quantile of 99, 95, 90, 75, or 50. For example,astra_db_read_latency_secondsP99:rate1m
. Alert based on your application Service Level Objective (business requirement). -
astra_db_range_latency_seconds:rate1m
andastra_db_range_latency_seconds:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for range read throughput. Alert based on your application Service Level Objective (business requirement). -
astra_db_range_latency_secondsP$QUANTILE:rate1m
andastra_db_range_latency_secondsP$QUANTILE:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) of range read for latency, where $QUANTILE is a histogram quantile of 99, 95, 90, 75, or 50. For example,astra_db_range_latency_secondsP99
. Alert based on your application Service Level Objective (business requirement).
Prometheus setup at the destination
For information about setting up Prometheus itself as the destination of the forwarded Astra DB database metrics, see the Prometheus Getting Started documentation.
|
For Prometheus,
|
After completing those steps in your Prometheus environment, verify it by sending a POST
request to the remote write endpoint. Consider using the following example test client, which also verifies that ingress is setup properly:
Prometheus Remote Write Client (promremote
), written in Go.
For more information about Prometheus metric types, see this topic on prometheus.io. |
Kafka setup at the destination
For information about setting up Kafka as a destination of the forwarded Astra DB database metrics, see:
-
Kafka metrics overview and Kafka Monitoring in the open-source Apache Kafka documentation.
-
Confluent Cloud Kafka documentation.
Astra DB Metrics configuration for Amazon CloudWatch
For information about setting up Amazon CloudWatch, see this topic on the AWS site.
In AWS, the secret key user must have the |
Using Export Metrics in Astra Portal
The configuration steps depend on which destination you’ll use. Currently we support Prometheus remote_write, and Kafka destinations.
To ensure that metrics are enabled for your destination app, provide the relevant properties.
Each update to the metrics configuration in Astra Portal (and/or in the DevOps API) replaces any existing configuration. |
-
After logging into Astra Portal, navigate to your serverless database in the dashboard or create a new one.
-
If using the serverless database, ensure it is in a Ready status and click the Settings tab.
-
Scroll down to the Export Metrics section. The initial display:
-
Click Add Destination.
-
Select a destination; currently, Kafka, Prometheus, or Amazon CloudWatch. (For a given database, you can export metrics to just one destination at a time.)
-
If you selected Kafka, the properties to enter on its form are:
-
SASL Mechanism - your Kafka Simple Authentication and Security Layer (SASL) mechanism for authentication and data security. Possible value, one of: GSSAPI, PLAIN, SCRAM-SHA-256, or SCRAM-SHA-512. For background information, see the Confluent Kafka - Authentication Methods Overview documentation.
-
SASL Username - Existing username for Kafka authentication.
-
SASL Password - Existing password for Kafka authentication.
-
Topic - Kafka topic to which Astra DB will export the metrics; you must create this topic on your server(s).
-
Bootstrap Servers - One or more Kafka Bootstrap Server entries. Example: pkc-9999e.us-east-1.aws.confluent.cloud:9092
-
(Optional) Kafka Security Protocol - Most Kafka installations will not require this setting for Astra DB Metrics to connect. Users of hosted Kafka on Confluent Cloud, though, may need to set SASL_SSL in this Security Protocol property. Valid options are:
-
SASL_PLAINTEXT - SASL authenticated, non-encrypted channel.
-
SASL_SSL - SASL authenticated, encrypted channel. Non-Authenticated options (SSL and PLAINTEXT) are not supported.
Be sure to specify the appropriate, related SASL Mechanism property. For Confluent Cloud, you may only be able to use PLAIN. See the Confluent Cloud security tutorial. From the Confluent docs: "Confluent Cloud uses SASL/PLAIN (or PLAIN) over TLS v1.2 encryption for authentication because it offers broad client support while providing a good level of security. The usernames and passwords used in the SASL exchange are API keys and secrets that should be securely managed using a secrets store and rotated periodically."
-
-
Example of a completed form:
-
When you’re ready with the Kafka form’s entries, click Add Destination.
-
-
If you selected Prometheus on the initial metrics destination page, the Prometheus properties you enter on its form depend first on whether you select Basic or Bearer as the Prometheus Strategy (the auth type). Example:
-
If you chose Bearer from the menu, provide your Prometheus Token value and Prometheus Endpoint on the resulting form. Notice that the form does not display username/password properties for a Prometheus strategy of Bearer.
-
If you chose Basic from the form’s menu, provide your Prometheus Username, Password, and Endpoint on the resulting form. Notice that the form does not display a Token property for a Prometheus strategy of Basic.
Example form when your Prometheus Strategy is Bearer:
Example form when your Prometheus Strategy is Basic:
When you’ve completed the Prometheus form’s entries, click Add Destination.
-
-
If you selected Amazon CloudWatch on the initial metrics destination page, enter your AWS:
-
Access Key
-
Secret Key
-
Region
You provide your AWS access keys so that AWS can verify your identity. Your access keys consist of an Access Key ID (for example,
AKIAIOSFODNN7EXAMPLE
) and a Secret Key ID (for example,wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
). These values are in AWS console.Here’s an example completed form in Astra Portal.
In the Region field, you can specify the same AWS region used by the Astra DB serverless database from which you’ll export metrics. However, you have the option of specifying a different AWS region; for example, you might use this option if your app is in another region and you want to see the metrics together.
-
When you’ve completed the CloudWatch form’s entries, click Add Destination.
-
After adding the metrics destination
After you add a Kafka, Prometheus, or Amazon CloudWatch destination, a confirmation message appears and the Export Metrics UI under Settings shows the destination. Example:
If the configuration’s settings are valid, Astra DB exports the health metrics for the specified database. See the next section for an example of using Grafana Cloud to visualize the exported metrics.
If needed, you can click the three vertical dots for options to Modify the destination’s configuration, or Delete the destination. Example: Modifying an existing destination allows you to edit the configuration’s properties, if necessary. Deleting an existing destination’s configuration in Astra DB would then allow you to try again, or to add a new type of destination, such as switching from Kafka to Prometheus or CloudWatch. For a given Astra DB database, you can only configure the export of metrics to one destination at a time. |
If you decide to delete a metrics destination, Astra DB displays a message with an alternative option to Update (rather than Delete) the destination’s configuration. Example:
Visualize exported Astra DB metrics with Grafana Cloud
You can configure Grafana Cloud to consume Astra DB serverless health metrics.
The detailed steps involve setup using Grafana Cloud, and the DataStax DevOps v2 API. See this Grafana Cloud section of the "Export Metrics via DevOps API" topic.
Once configured, you can use your own Grafana Cloud instance to monitor the Astra DB database’s health via its metrics.
Using Grafana Cloud is optional. You can choose your favorite tool to visualize the Astra DB metrics that you exported to Kafka, Prometheus, Amazon CloudWatch, Splunk, Pulsar, or Datadog. |
We’ll use Prometheus as the destination system in the examples. You’ll need a Grafana Cloud account. They offer a Free plan with 14-day retention. See Grafana pricing.
What’s next?
See the following related topics.
Destination documentation
-
Getting Started with Prometheus
-
Splunk (via DevOps API)
-
Pulsar (via DevOps API)
-
Datadog (via DevOps API)