Export metrics via Astra Portal
Enterprises depend on the ability to view database health metrics in centralized systems along with their other software metrics. The Astra DB Export Metrics feature lets you forward Astra DB database health metrics to an external third-party metrics system. We refer to the recipient of the exported metrics as the destination system.
You can configure the export of Astra DB metrics via Astra Portal (described in this topic), or via the DevOps API.
Benefits
The Astra DB Export Metrics feature allows you to take full control of forwarding Astra DB database health metrics to your preferred observability system. The functionality is intended for developers, site reliability engineers (SREs), IT managers, and product owners.
Ingesting database health metrics into your system gives you the ability to craft your own alerting actions and dashboards based on your service level objectives and retention requirements. While you can continue to view metrics displayed in Astra Portal via each database’s Health tab, forwarding metrics to a third-party app gives you a more complete view of all metrics being tracked, across all your products.
This enhanced capability can provide your team with broader insights into historical performance, issues, and areas for improvement.
The exported Astra DB health metrics are nearly real-time when consumed externally. You can find the source-of-truth view of your metric values in the Astra Portal’s Health dashboard. |
Prerequisites
-
If you haven’t already, create a serverless database using the Astra Portal.
-
Ensure you have an admin permission to view and use the Export Metrics UI, which is under Settings for each database. See Roles and permissions in this topic.
You’ll need an existing destination system to receive the forwarded Astra DB metrics. Supported destinations are Amazon CloudWatch, Apache Kafka, Confluent Kafka, Datadog, Prometheus, Pulsar/Streaming, and Splunk. You can also use Grafana / Grafana Cloud to visualize the exported metrics. |
Pricing
With an Astra DB PAYG or Enterprise plan, there is no additional cost to using Astra DB Metrics, outside of standard data transfer charges. Exporting third-party metrics is not available on the Astra DB Free Tier.
Metrics monitoring may incur costs at the destination system. Consult the destination system’s documentation for its pricing information.
Roles and permissions
The following Astra DB roles can export third-party metrics:
-
Organization Administrator (recommended)
-
Database Administrator
-
Service Account Administrator
-
User Administrator
The required db-manage-thirdpartymetrics permission is automatically assigned to those roles.
If you create a custom role in Astra DB, be sure to assign db-manage-thirdpartymetrics permission to the custom role.
Database health metrics forwarded by Astra DB
Metrics are not alerts. Astra DB provides the ability to export metrics to an external destination system so that you can devise alerts based on the metrics' values. Also, note that when you use the Astra DB Metrics feature in conjunction with the private link feature, the exported metrics traffic does not make use of the private link connection. Metrics traffic flows over the public interfaces as it would without a private link. |
The Astra DB health metrics forwarded are aggregated values, calculated once every 1 minute. For each metric, a rate of increase over both 1 minute and 5 minutes will be produced. The following database health metrics will be forwarded by the Astra DB Metrics feature:
-
astra_db_rate_limited_requests:rate1m
andastra_db_rate_limited_requests:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for the number of failed operations due to an Astra DB rate limit. You can request that rate limits are increased for your Astra DB databases. Using these rates, alert if the value is > 0. -
astra_db_read_requests_failures:rate1m
andastra_db_read_requests_failures:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for the number of failed reads. Cassandra drivers retry failed operations, but significant failures can be problematic. Using these rates, alert if the value is > 0.Warn
alert on low amount.High
alert on larger amounts; determine potentially as a percentage of read throughput. -
astra_db_read_requests_timeouts:rate1m
andastra_db_read_requests_timeouts:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for read timeouts. Timeouts happen when operations against the database take longer than the server side timeout. Using these rates, alert if the value is > 0. -
astra_db_read_requests_unavailables:rate1m
andastra_db_read_requests_unavailables:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for reads where service is not available to complete a specific request. Using these rates, alert if the value is > 0. -
astra_db_write_requests_failures:rate1m
andastra_db_write_requests_failures:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for the number of failed writes. Cassandra drivers retry failed operations, but significant failures can be problematic. Using these rates, alert if the value is > 0.Warn
alert on low amount.High
alert on larger amounts; determine potentially as a percentage of read throughput. -
astra_db_write_requests_timeouts:rate1m
andastra_db_write_requests_timeouts:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for timeouts, which occur when operations take longer than the server side timeout. Using these rates, compare withwrite_requests_failures
. -
astra_db_write_requests_unavailables:rate1m
andastra_db_write_requests_unavailables:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for unavailable errors, which occur when the service is not available to service a particular request. Using these rates, compare withwrite_requests_failures
. -
astra_db_range_requests_failures:rate1m
andastra_db_range_requests_failures:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for the number of range reads that failed. Cassandra drivers retry failed operations, but significant failures can be problematic. Using these rates, alert if the value is > 0.Warn
alert on low amount.High
alert on larger amounts; determine potentially as a percentage of read throughput. -
astra_db_range_requests_timeouts:rate1m
andastra_db_range_requests_timeouts:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for timeouts, which are a subset of total failures. Use this metric to understand if failures are due to timeouts. Using these rates, compare withrange_requests_failures
. -
astra_db_range_requests_unavailables:rate1m
andastra_db_range_requests_unavailables:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for unavailable errors, which are a subset of total failures. Use this metric to understand if failures are due to timeouts. Using these rates, compare withrange_requests_failures
. -
astra_db_write_latency_seconds:rate1m
andastra_db_write_latency_seconds:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for write throughput. Alert based on your application Service Level Objective (business requirement). -
astra_db_write_latency_seconds_P$QUANTILE:rate1m
andastra_db_write_latency_seconds_P$QUANTILE:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for write latency, where $QUANTILE is a histogram quantile of 99, 95, 90, 75, or 50 (e.g.astra_db_write_latency_seconds_P99:rate1m
). Alert based on your application Service Level Objective (business requirement). -
astra_db_write_requests_mutation_size_bytesP$QUANTILE:rate1m
andastra_db_write_requests_mutation_size_bytesP$QUANTILE:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for how big writes are over time, where $QUANTILE is a histogram quantile of 99, 95, 90, 75, or 50. For example,astra_db_write_requests_mutation_size_bytesP99:rate1m
. -
astra_db_read_latency_seconds:rate1m
andastra_db_read_latency_seconds:rate5m
- Take the rate for read throughput. Alert based on your application Service Level Objective (business requirement). -
astra_db_read_latency_secondsP$QUANTILE:rate1m
andastra_db_read_latency_secondsP$QUANTILE:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for percentiles read for latency, where $QUANTILE is a histogram quantile of 99, 95, 90, 75, or 50. For example,astra_db_read_latency_secondsP99:rate1m
. Alert based on your application Service Level Objective (business requirement). -
astra_db_range_latency_seconds:rate1m
andastra_db_range_latency_seconds:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) for range read throughput. Alert based on your application Service Level Objective (business requirement). -
astra_db_range_latency_secondsP$QUANTILE:rate1m
andastra_db_range_latency_secondsP$QUANTILE:rate5m
- A calculated rate of change (over 1 minute and 5 minutes, respectively) of range read for latency, where $QUANTILE is a histogram quantile of 99, 95, 90, 75, or 50. For example,astra_db_range_latency_secondsP99
. Alert based on your application Service Level Objective (business requirement).
Prometheus setup at the destination
For information about setting up Prometheus itself as the destination of the forwarded Astra DB database metrics, see the Prometheus Getting Started documentation.
|
For Prometheus,
|
After completing those steps in your Prometheus environment, verify it by sending a POST
request to the remote write endpoint. Consider using the following example test client, which also verifies that ingress is setup properly:
Prometheus Remote Write Client (promremote
), written in Go.
For more information about Prometheus metric types, see this topic on prometheus.io. |
Kafka setup at the destination
For information about setting up Kafka as a destination of the forwarded Astra DB database metrics, see:
-
Kafka metrics overview and Kafka Monitoring in the open-source Apache Kafka documentation.
-
Confluent Cloud Kafka documentation.
Amazon CloudWatch setup at the destination
For information about setting up Amazon CloudWatch, see this topic on the AWS site.
In AWS, the secret key user must have the |
Splunk setup at the destination
For information about setting up Splunk, see Splunk observability in the Splunk documentation.
Pulsar setup at the destination
For information about setting up Pulsar, see Get started in the Pulsar documentation.
Datadog setup at the destination
For information about setting up Datadog, see these Datadog documentation topics:
For related tips about filling out the Datadog export metrics form that’s presented in Astra Portal, see Datadog settings below.
Using Export Metrics in Astra Portal
The configuration steps depend on which destination you’ll use. Currently we support Amazon CloudWatch, Apache Kafka, Confluent Kafka, Datadog, Prometheus, Pulsar/Streaming, and Splunk. You can also use Grafana / Grafana Cloud to visualize the exported metrics.
To ensure that metrics are enabled for your destination app, provide the relevant properties.
Each update to the metrics configuration in Astra Portal (and/or in the DevOps API) replaces any existing configuration. |
-
After logging into Astra Portal, navigate to your serverless database in the dashboard or create a new one.
-
Ensure that your serverless database is in a Ready status and click the Settings tab.
-
Scroll down to the Export Metrics section. The initial display:
-
Click Add Destination.
-
Select a destination. For a given database, you can export metrics to just one destination at a time.
Prometheus settings
If you selected Prometheus on the initial metrics destination page, the Prometheus properties you enter on its form depend first on whether you select Basic or Bearer as the Prometheus Strategy (the auth type). Example:
If you chose Bearer from the menu, provide your Prometheus Token value and Prometheus Endpoint on the resulting form. Notice that the form does not display username/password properties for a Prometheus strategy of Bearer.
If you chose Basic from the form’s menu, provide your Prometheus Username, Password, and Endpoint on the resulting form. Notice that the form does not display a Token property for a Prometheus strategy of Basic.
Here’s an example of a completed form in Astra Portal when your Prometheus Strategy is Bearer:
Example form when your Prometheus Strategy is Basic:
When you’ve completed the Prometheus form’s entries, click Add Destination.
Kafka settings
If you selected Kafka, the properties to enter on its form are:
-
SASL Mechanism - your Kafka Simple Authentication and Security Layer (SASL) mechanism for authentication and data security. Possible value, one of: GSSAPI, PLAIN, SCRAM-SHA-256, or SCRAM-SHA-512. For background information, see the Confluent Kafka - Authentication Methods Overview documentation.
-
SASL Username - Existing username for Kafka authentication.
-
SASL Password - Existing password for Kafka authentication.
-
Topic - Kafka topic to which Astra DB will export the metrics; you must create this topic on your server(s).
-
Bootstrap Servers - One or more Kafka Bootstrap Server entries. Example: pkc-9999e.us-east-1.aws.confluent.cloud:9092
-
(Optional) Kafka Security Protocol - Most Kafka installations will not require this setting for Astra DB Metrics to connect. Users of hosted Kafka on Confluent Cloud, though, may need to set SASL_SSL in this Security Protocol property. Valid options are:
-
SASL_PLAINTEXT - SASL authenticated, non-encrypted channel.
-
SASL_SSL - SASL authenticated, encrypted channel. Non-Authenticated options (SSL and PLAINTEXT) are not supported.
Be sure to specify the appropriate, related SASL Mechanism property. For Confluent Cloud, you may only be able to use PLAIN. See the Confluent Cloud security tutorial. From the Confluent docs: "Confluent Cloud uses SASL/PLAIN (or PLAIN) over TLS v1.2 encryption for authentication because it offers broad client support while providing a good level of security. The usernames and passwords used in the SASL exchange are API keys and secrets that should be securely managed using a secrets store and rotated periodically."
Here’s an example of a completed form in Astra Portal:
When you’re ready with the Kafka form’s entries, click Add Destination.
Amazon CloudWatch settings
If you selected Amazon CloudWatch on the initial metrics destination page, enter your AWS access keys so that AWS can verify your identity in programmatic calls. Your access keys consist of an access key ID and a secret access key.
-
Access Key: Your AWS access key. For example,
AKIAIOSFODNN7EXAMPLE
. Get the value from your account in the AWS console. -
Secret Key: Your AWS secret key. For example,
wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
. Get the value from your account in the AWS console. -
Region: You can enter the same AWS region used by the Astra DB serverless database from which you’ll export metrics. However, you have the option of specifying a different AWS region; for example, you might use this option if your app is in another region and you want to see the metrics together.
Here’s an example of a completed form in Astra Portal.
When you’ve completed the CloudWatch form’s entries, click Add Destination.
Splunk settings
If you selected Splunk on the initial metrics destination page, enter:
-
Endpoint: The full HTTP address and path for the Splunk HTTP Event Collector (HEC) endpoint. If you are unsure of this address, please contact your Splunk Administrator.
-
Index: The Splunk index to which you want to write metrics. The identified index must be set so the Splunk token has permission to write to it.
-
Token: The Splunk HTTP Event Collector (HEC) token for Splunk authentication.
-
Source: Optional. You can enter the source of events sent to this sink. If unset, Astra Portal sets this field to a default value:
astradb
. -
Source Type: Optional. You can enter the source type of events sent to this sink. If unset, the API sets it to a default value:
astradb-metrics
.
Here’s an example of a completed form in Astra Portal.
When you’ve completed the Splunk form’s entries, click Add Destination.
Pulsar/Streaming settings
If you selected Pulsar/Streaming on the initial metrics destination page, enter:
-
Endpoint: The URL of your Pulsar Broker.
-
Topic: The Pulsar topic to which you’ll publish telemetry.
-
Auth Name: The authentication name.
-
Auth Strategy: The authentication strategy used by your Pulsar broker. From the drop-down menu, choose token or oauth2.
-
If the Auth Strategy is token, provide the token for Pulsar authentication.
-
Or if the Auth Strategy is oauth2, provide the required Oauth2 Credentials URL and the Oauth2 Issuer URL properties. You may also provide (optionally) the Oauth2 Audience and Oauth2 Scope.
-
For related information, see Authentication using OAuth 2.0 access tokens.
Here’s an example of a completed form in Astra Portal when your Auth Strategy is token:
Example form when your Auth Strategy is oauth2:
When you’ve completed the Pulsar/Streaming form’s entries, click Add Destination.
Datadog settings
If you selected Datadog on the initial metrics destination page, enter values for:
-
API Key: The required API key so that your Astra DB metrics export operation can successfully authenticate into the Datadog API. For details, see this Authentication topic in the Datadog documentation.
Before entering the API Key value on the Astra Portal Datadog form, you should validate that it’s correct by using the Validate API key
curl
command that’s described in the Datadog documentation. -
Site: The Datadog site to which the exported Astra DB health metrics will be sent. For details, including the correct format to specify on the Astra Portal Datadog form, see this Getting Started with Datadog Sites topic in the Datadog documentation.
Datadog sites are named in different ways. See the Datadog documentation for important details. Summary:
-
If you’ll send Astra DB health metrics to a Datadog site prefixed with "app", remove both the "https://" protocol and the "app" prefix from the Site parameter that you specify on the Astra Portal Datadog form.
-
If you’ll send Astra DB health metrics to a Datadog site prefixed with a subdomain such as "us5", remove only the "https://" protocol from the Site parameter that you specify on the Astra Portal Datadog form.
-
Other Datadog Site parameters are possible. See the table in the Datadog documentation for guidance on the appropriate Site parameter format.
-
When you’ve completed the Datadog form’s entries, click Add Destination.
After adding the metrics destination
After you add a Kafka, Prometheus, or Amazon CloudWatch destination, a confirmation message appears and the Export Metrics UI under Settings shows the destination. Example:
If the configuration’s settings are valid, Astra DB exports the health metrics for the specified database. See the next section for an example of using Grafana Cloud to visualize the exported metrics.
If needed, you can click the three vertical dots for options to Modify the destination’s configuration, or Delete the destination. Example: Modifying an existing destination allows you to edit the configuration’s properties, if necessary. Deleting an existing destination’s configuration in Astra DB would then allow you to try again, or to add a new type of destination, such as switching from Kafka to Prometheus or CloudWatch. For a given Astra DB database, you can only configure the export of metrics to one destination at a time. |
If you decide to delete a metrics destination, Astra DB displays a message with an alternative option to Update (rather than Delete) the destination’s configuration. Example:
Visualize exported Astra DB metrics with Grafana Cloud
You can configure Grafana Cloud to consume Astra DB serverless health metrics.
The detailed steps involve setup using Grafana Cloud, and the DataStax DevOps v2 API. See this Grafana Cloud section of the "Export Metrics via DevOps API" topic.
Once configured, you can use your own Grafana Cloud instance to monitor the Astra DB database’s health via its metrics.
Using Grafana Cloud is optional. You can choose your favorite tool to visualize the Astra DB metrics that you exported to Kafka, Prometheus, Amazon CloudWatch, Splunk, Pulsar, or Datadog. |
We’ll use Prometheus as the destination system in the examples. You’ll need a Grafana Cloud account. They offer a Free plan with 14-day retention. See Grafana pricing.
What’s next?
See the following related topics.