View and export metrics

You can view and export health metrics from Astra. These metrics provide insights into the performance of your databases and how workloads are distributed.

View metrics in the Astra Portal

You can view metrics for an individual database with the Astra Portal. The following metrics are available:

Total Latency by Percentile

A p50 quantile indicates 50% of the database requests were processed faster than this value, and a p99 indicates the same for 99% of requests.

Total Throughput

The number of tasks or requests the database is processing in a given amount of time. The key metrics separate between read and write throughput, then provide the average and total (both read and write throughputs). This metric is measured in seconds.

The way you access these metrics depends on whether you’re using a Serverless (Vector) database or a Serverless (Non-Vector) database.

  • Serverless (Vector) databases

  • Serverless (Non-Vector) databases

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to open the database details.

  3. Optional: Select a time period from the schedule Time Picker.

  4. Hover over the Key Metrics graphs to display the metrics for a particular time.

If you have more than one region for this database, change the drop-down to see the metrics for a specific region.

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to open the database details.

  3. Go to Health.

  4. Optional: Select a time period from the schedule Time Picker.

  5. Hover over the Astra Cluster Condensed graph to display the metrics for a particular time.

Gaps in the read/write metrics are normal; they indicate periods when no requests are happening.

Export metrics to third-party services

You can forward Astra DB database health metrics to a third-party observability service.

When you use the Export Metrics feature in conjunction with Private Endpoints, the exported metrics traffic does not make use of the private connection.

Export metrics to Prometheus

You can export metrics to Prometheus using the Astra Portal or the DevOps API.

  • Astra Portal

  • DevOps API

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to view its details.

  3. Ensure that your database is in an Active status, and then select Settings from the dashboard.

  4. Scroll down to the Export Metrics section, and then select Add Destination.

  5. Select Prometheus and then select your Prometheus Strategy: Bearer or Basic.

    1. If you select Bearer, provide your Prometheus Token and Prometheus Endpoint.

    2. If you select Basic, provide your Prometheus Username, Prometheus Password, and Prometheus Endpoint.

  6. Select Add Destination.

  7. The destination appears in the Export Metrics section.

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to view its details.

  3. Copy the Database ID.

  4. If you don’t already have a token with the db-manage-thirdpartymetrics permission, select Generate Token to generate one.

  5. If you haven’t already set up Prometheus, see Prometheus Getting Started.

    1. You must use Prometheus v2.25 or greater. DataStax recommends Prometheus v2.33 or greater.

    2. You must enable remote-write-receiver in the destination app. For more details, see Remote Write Receiver, Remote storage integrations, and <remote_write>.

    3. Test the destination app by sending it a POST request using a Prometheus Remote Write Client.

    4. For more details about Prometheus metric types, see Metric types.

  6. Export metrics to Prometheus using a POST request. Each POST replaces any existing configuration.

    1. Export metrics using a Prometheus token:

      curl --request POST \
        --url 'https://api.astra.datastax.com/v2/databases/DATABASE_ID/telemetry/metrics' \
        --header 'Accept: application/json' \
        --header 'Authorization: Bearer DATABASE_TOKEN' \
        --include \
        --data '{
                  "prometheus_remote":  {
                    "endpoint": "https://prometheus.example.com/api/prom/push",
                    "auth_strategy": "bearer",
                    "token": PROMETHEUS_TOKEN
                  }
                }'
    2. Export metrics using a Prometheus username and password:

      curl --request POST \
        --url 'https://api.astra.datastax.com/v2/databases/DATABASE_ID/telemetry/metrics' \
        --header 'Accept: application/json' \
        --header 'Authorization: Bearer DATABASE_TOKEN' \
        --include \
        --data '{
                  "prometheus_remote":  {
                    "endpoint": "https://prometheus.example.com/api/prom/push",
                    "auth_strategy": "basic",
                    "password": PROMETHEUS_PASSWORD,
                    "user": PROMETHEUS_USERNAME
                  }
                }'

      The response may have one of the following status codes:

      202 OK
      400 Bad request.
      401 Unauthorized.
      403 The user is forbidden to perform the operation.
      404 The specified resource was not found.
      409 The request could not be processed because of conflict.
      5XX A server error occurred.

      Here is an example response:

      {
        "errors": [
          {
            "description": "The name of the environment must be provided",
            "internalCode": "a1012",
            "internalTxId": "103B-A018-3898-0ABF"
          }
        ]
      }
  7. Optional: Verify that Prometheus is configured correctly:

    curl --request GET \
      --url 'https://api.astra.datastax.com/v2/databases/DATABASE_ID/telemetry/metrics' \
      --header 'Accept: application/json' \
      --header 'Authorization: Bearer DATABASE_TOKEN' \
      --include

    The response may have one of the following status codes:

    200 OK
    400 Bad request.
    403 The user is forbidden to perform the operation.
    404 The specified resource was not found.
    500 A server error occurred.

    Here are some example responses:

    {
        "prometheus_remote": {
            "endpoint": "https://prometheus.example.com/api/prom/push",
            "auth_strategy": "basic",
            "user": "PROMETHEUS_USERNAME",
            "password": "PROMETHEUS_PASSWORD"
      }
    }
    {
      "errors": [
        {
          "description": "The name of the environment must be provided",
          "internalCode": "a1012",
          "internalTxId": "103B-A018-3898-0ABF"
        }
      ]
    }

Export metrics to Kafka

You can export metrics to Kafka using the Astra Portal or the DevOps API.

  • Astra Portal

  • DevOps API

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to view its details.

  3. Ensure that your database is in an Active status, and then select Settings from the dashboard.

  4. Scroll down to the Export Metrics section, and then select Add Destination.

  5. Select Kafka.

  6. Optional: Select a Kafka Security Protocol. Most Kafka installations do not require this setting for these metrics to connect. Users of hosted Kafka on Confluent Cloud may need to set SASL_SSL in this Security Protocol property.

    Valid options are:

    SASL_PLAINTEXT - SASL authenticated, non-encrypted channel.

    SASL_SSL - SASL authenticated, encrypted channel.

    Non-Authenticated options (SSL and PLAINTEXT) are not supported. Specify the appropriate, related SASL Mechanism property. For more information, see the Confluent Cloud security tutorial.

  7. Choose a SASL Mechanism. This is your Kafka Simple Authentication and Security Layer (SASL) mechanism for authentication and data security. Possible values include: GSSAPI, PLAIN, SCRAM-SHA-256, or SCRAM-SHA-512. For background information, see the Confluent Kafka - Authentication Methods Overview documentation.

  8. Provide your SASL Username and SASL Password to authenticate to Kafka.

  9. Provide a Topic. This is the Kafka topic to which Astra DB exports the metrics. You must create this topic on your server(s).

  10. Provide a list of Bootstrap Servers, one or more Kafka Bootstrap Server entries (e.g. pkc-9999e.us-east-1.aws.confluent.cloud:9092).

  11. Select Add Destination.

  12. The destination appears in the Export Metrics section.

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to view its details.

  3. Copy the Database ID.

  4. If you don’t already have a token with the db-manage-thirdpartymetrics permission, select Generate Token to generate one.

  5. If you haven’t already set up Kafka, see Kafka metrics overview, Kafka Monitoring, and Kafka on Confluent Cloud.

  6. Export metrics to Kafka using a POST request.

    curl --request POST \
      --url 'https://api.astra.datastax.com/v2/databases/DATABASE_ID/telemetry/metrics' \
      --header 'Accept: application/json' \
      --header 'Authorization: Bearer DATABASE_TOKEN' \
      --include \
      --data '{
          "kafka": {
              "bootstrap_servers": [
                  "BOOTSTRAP_SERVER_URL"
              ],
              "topic": "astra_metrics_events",
              "sasl_mechanism": "PLAIN",
              "sasl_username": "KAFKA_USER",
              "sasl_password": "KAFKA_PASSWORD",
              "security_protocol": "SASL_PLAINTEXT"
          }
        }'

    The security_protocol property is an advanced option, and is not required. Most Kafka installations will not require this setting to export metrics. Users of hosted Kafka on Confluent Cloud, though, may need to set 'SASL_SSL' in the security_protocol property. Valid options are:

    • SASL_PLAINTEXT - SASL authenticated, non-encrypted channel.

    • SASL_SSL - SASL authenticated, encrypted channel. Non-Authenticated options (SSL and PLAINTEXT) are not supported.

    Be sure to specify the appropriate, related sasl_mechanism property. For Confluent Cloud, you may only be able to use PLAIN. See the Confluent Cloud security tutorial. From the Confluent docs: "Confluent Cloud uses SASL/PLAIN (or PLAIN) over TLS v1.2 encryption for authentication because it offers broad client support while providing a good level of security. The usernames and passwords used in the SASL exchange are API keys and secrets that should be securely managed using a secrets store and rotated periodically."

    The response may have one of the following status codes:

    202 OK
    400 Bad request.
    401 Unauthorized.
    403 The user is forbidden to perform the operation.
    404 The specified resource was not found.
    409 The request could not be processed because of conflict.
    5XX A server error occurred.

    Here is an example response:

    {
      "errors": [
        {
          "description": "The name of the environment must be provided",
          "internalCode": "a1012",
          "internalTxId": "103B-A018-3898-0ABF"
        }
      ]
    }
  7. Optional: Verify that Kafka is configured correctly:

    curl --request GET \
      --url 'https://api.astra.datastax.com/v2/databases/DATABASE_ID/telemetry/metrics' \
      --header 'Accept: application/json' \
      --header 'Authorization: Bearer DATABASE_TOKEN' \
      --include

    The response may have one of the following status codes:

    200 OK
    400 Bad request.
    403 The user is forbidden to perform the operation.
    404 The specified resource was not found.
    500 A server error occurred.

    Here are some example responses:

    {
      "kafka": {
        "bootstrap_servers": [
            "BOOTSTRAP_SERVER_URL"
        ],
        "topic": "astra_metrics_events",
        "sasl_mechanism": "PLAIN",
        "sasl_username": "KAFKA_USERNAME",
        "sasl_password": "KAFKA_PASSWORD",
        "security_protocol": "SASL_PLAINTEXT"
      }
    }
    {
      "errors": [
        {
          "description": "The name of the environment must be provided",
          "internalCode": "a1012",
          "internalTxId": "103B-A018-3898-0ABF"
        }
      ]
    }

Export metrics to Amazon CloudWatch

You can export metrics to Amazon Cloudwatch using the Astra Portal or the DevOps API.

  • Astra Portal

  • DevOps API

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to view its details.

  3. Ensure that your database is in an Active status, and then select Settings from the dashboard.

  4. Scroll down to the Export Metrics section, and then select Add Destination.

  5. Select Amazon CloudWatch.

  6. Provide your Access Key (e.g. AKIAIOSFODNN7EXAMPLE) and Secret Key (e.g. wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY) to authenticate to AWS.

  7. Select a Region. You do not have to select the same region that you chose for your Astra DB serverless database.

  8. Select Add Destination.

  9. The destination appears in the Export Metrics section.

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to view its details.

  3. Copy the Database ID.

  4. If you don’t already have a token with the db-manage-thirdpartymetrics permission, select Generate Token to generate one.

  5. If you haven’t already set up Amazon CloudWatch, see Getting set up with Amazon CloudWatch and Amazon CloudWatch permissions reference.

    In AWS, the secret key user must have the PutMetricData action defined as the minimum required permission. For example, in AWS Identity and Access Management (IAM) define a policy such as the following, and attach that policy to the user account that will receive the Astra DB exported metrics in CloudWatch.

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "VisualEditor0",
          "Effect": "Allow",
          "Action": "cloudwatch:PutMetricData",
          "Resource": "*"
        }
      ]
    }
  6. Export metrics to Amazon CloudWatch using a POST request.

    Example POST payload for CloudWatch:
    
    curl --request POST \
      --url 'https://api.astra.datastax.com/v2/databases/DATABASE_ID/telemetry/metrics' \
      --header 'Accept: application/json' \
      --header 'Authorization: Bearer DATABASE_TOKEN' \
      --include \
      --verbose \
      --data '{
      "cloudwatch": {
        "access_key": "AWS_ACCESS_KEY",
        "secret_key": "AWS_SECRET_KEY",
        "region": "AWS_REGION"
      }
    }'

    The response may have one of the following status codes:

    202 OK
    400 Bad request.
    401 Unauthorized.
    403 The user is forbidden to perform the operation.
    404 The specified resource was not found.
    409 The request could not be processed because of conflict.
    5XX A server error occurred.

    Here is an example response:

    {
      "errors": [
        {
          "description": "The name of the environment must be provided",
          "internalCode": "a1012",
          "internalTxId": "103B-A018-3898-0ABF"
        }
      ]
    }
  7. Optional: Verify that Amazon CloudWatch is configured correctly:

    curl --request GET \
      --url 'https://api.astra.datastax.com/v2/databases/$DB_ID/telemetry/metrics' \
      --header 'Accept: application/json' \
      --header 'Authorization: Bearer AstraCS:FAKETOKENVALUE:1b152ed223d3a6e4e61a999999999ef64ecec800e97d72669e4cEXAMPLE9' \
      --include \
      --verbose

    The response may have one of the following status codes:

    200 OK
    400 Bad request.
    403 The user is forbidden to perform the operation.
    404 The specified resource was not found.
    500 A server error occurred.

    Here are some example responses:

    {
      "cloudwatch": {
        "access_key": "AWS_ACCESS_KEY",
        "secret_key": "AWS_SECRET_KEY",
        "region": "AWS_REGION"
      }
    }
    {
      "errors": [
        {
          "description": "The name of the environment must be provided",
          "internalCode": "a1012",
          "internalTxId": "103B-A018-3898-0ABF"
        }
      ]
    }

Export metrics to Splunk

You can export metrics to Splunk using the Astra Portal or the DevOps API.

  • Astra Portal

  • DevOps API

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to view its details.

  3. Ensure that your database is in an Active status, and then select Settings from the dashboard.

  4. Scroll down to the Export Metrics section, and then select Add Destination.

  5. Select Splunk.

  6. Provide an Endpoint. This is the full HTTP address and path for the Splunk HTTP Event Collector (HEC) endpoint.

  7. Provide an Index. This is the Splunk index to which you want to write metrics.

  8. Provide a Token. This is the Splunk HTTP Event Collector (HEC) token for Splunk authentication.

  9. Provide a Source. This is the source of events sent to the sink. If you do not provide a source, Astra sets it to astradb by default.

  10. Provide a Source Type. This is the type of events sent to this sink. If you do not provide a source, Astra sets it to astradb-metrics by default.

  11. Select Add Destination.

  12. The destination appears in the Export Metrics section.

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to view its details.

  3. Copy the Database ID.

  4. If you don’t already have a token with the db-manage-thirdpartymetrics permission, select Generate Token to generate one.

  5. Get Splunk connection information from your administrator:

    1. endpoint: the full HTTP address and path for the Splunk HTTP Event Collector (HEC) endpoint. If you are unsure of this address, please contact your Splunk Administrator.

    2. index: The Splunk index to which you want to write metrics. The identified index must be set so the Splunk token has permission to write to it.

    3. token: The Splunk HEC token for Splunk authentication.

    4. source: The source of events sent to this sink. If unset, the API sets it to a default value: astradb.

    5. sourcetype: The source type of events sent to this sink. If unset, the API sets it to a default value: astradb-metrics.

  6. Export metrics to Prometheus using a POST request.

    curl --request POST \
      --url 'https://api.astra.datastax.com/v2/databases/DATABASE_ID/telemetry/metrics' \
      --header 'Accept: application/json' \
      --header 'Authorization: Bearer DATABASE_TOKEN' \
      --include \
      --verbose \
      --data '{
      "splunk": {
        "endpoint": "https://http-inputs-YOUR_COMPANY.splunkcloud.com",
        "index": "astra_third_party_metrics_test",
        "token": "SPLUNK_TOKEN",
        "source": "SPLUNK_SOURCE",
        "sourcetype": "SPLUNK_SOURCETYPE"
      }
    }'

    The response may have one of the following status codes:

    202 OK
    400 Bad request.
    401 Unauthorized.
    403 The user is forbidden to perform the operation.
    404 The specified resource was not found.
    409 The request could not be processed because of conflict.
    5XX A server error occurred.

    Here is an example response:

    {
      "errors": [
        {
          "description": "The name of the environment must be provided",
          "internalCode": "a1012",
          "internalTxId": "999B-A099-9999-0ABF"
        }
      ]
    }
  7. Optional: Verify that Splunk is configured correctly:

    curl --request GET \
      --url 'https://api.astra.datastax.com/v2/databases/DATABASE_ID/telemetry/metrics' \
      --header 'Accept: application/json' \
      --header 'Authorization: Bearer DATABASE_TOKEN' \
      --include \
      --verbose

    The response may have one of the following status codes:

    200 OK
    400 Bad request.
    403 The user is forbidden to perform the operation.
    404 The specified resource was not found.
    500 A server error occurred.

    Here are some example responses:

    {
      "splunk": {
        "endpoint": "https://http-inputs-YOUR_COMPANY.splunkcloud.com",
        "index": "astra_third_party_metrics_test",
        "token": "SPLUNK_TOKEN",
        "source": "SPLUNK_SOURCE",
        "sourcetype": "SPLUNK_SOURCETYPE"
      }
    }
    {
      "errors": [
        {
          "description": "The name of the environment must be provided",
          "internalCode": "a1012",
          "internalTxId": "999B-A999-9999-0ABF"
        }
      ]
    }

Export metrics to Pulsar

You can export metrics to Pulsar using the Astra Portal or the DevOps API.

  • Astra Portal

  • DevOps API

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to view its details.

  3. Ensure that your database is in an Active status, and then select Settings from the dashboard.

  4. Scroll down to the Export Metrics section, and then select Add Destination.

  5. Select Pulsar.

  6. Provide an Endpoint. This is the URL of your Pulsar Broker.

  7. Provide a Topic. This is the Pulsar topic that you publish telemetry to.

  8. Provide an Auth Name.

  9. Select an Auth Strategy. This is the authentication strategy used by your Pulsar broker.

    1. If you select token, provide your Token to authenticate to Pular.

    2. If you select oauth2, provide your OAuth2 Credentials URL and OAuth2 Issuer URL. You may also provide your OAuth2 Audience and OAuth2 Scope, but these are optional. For more details, see Authentication using OAuth 2.0 access tokens.

  10. Select Add Destination.

  11. The destination appears in the Export Metrics section.

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to view its details.

  3. Copy the Database ID.

  4. If you don’t already have a token with the db-manage-thirdpartymetrics permission, select Generate Token to generate one.

  5. Get Pulsar connection information from your administrator:

    1. endpoint: The URL of your Pulsar Broker.

    2. topic: The Pulsar topic to which you’ll publish telemetry.

    3. auth_name: The authentication name, such as my-auth.

    4. auth_strategy: Provide the authentication strategy used by your Pulsar broker. The value should be token or oauth2.

      1. If token, you need a Pulsar auth token.

      2. If oauth2, you need the Pulsar oauth2_credentials_url and oauth2_issuer_url properties. You may also provide the oauth_audience and oauth2_scope properties, but these are optional.

  6. Export metrics to Pulsar using a POST request.

    1. Export metrics using a token:

      curl --request POST \
        --url 'https://api.astra.datastax.com/v2/databases/DATABASE_ID/telemetry/metrics' \
        --header 'Accept: application/json' \
        --header 'Authorization: Bearer DATABASE_TOKEN' \
        --include \
        --verbose \
        --data '{
        "pulsar": {
          "endpoint": "PULSAR_ENDPOINT",
          "topic": "PULSAR_TOPIC",
          "auth_strategy": "token",
          "token": "PULSAR_TOKEN",
          "auth_name": "PULSAR_AUTH_NAME"
        }
      }'
    2. Export metrics using oauth2:

      curl --request POST \
        --url 'https://api.astra.datastax.com/v2/databases/DATABASE_ID/telemetry/metrics' \
        --header 'Accept: application/json' \
        --header 'Authorization: Bearer DATABASE_TOKEN' \
        --include \
        --verbose \
        --data '{
        "pulsar": {
          "endpoint": "PULSAR_ENDPOINT",
          "topic": "*PULSAR_TOPIC",
          "auth_strategy": "oauth2",
          "oauth2_credentials_url": "PULSAR_OAUTH2_CREDENTIALS_URL",
          "oauth2_issuer_url": "PULSAR_OAUTH2_ISSUER_URL"
          }
      }'

      The response may have one of the following status codes:

      202 OK
      400 Bad request.
      401 Unauthorized.
      403 The user is forbidden to perform the operation.
      404 The specified resource was not found.
      409 The request could not be processed because of conflict.
      5XX A server error occurred.

      Here is an example response:

      {
        "errors": [
          {
            "description": "The name of the environment must be provided",
            "internalCode": "a1012",
            "internalTxId": "103B-A018-3898-0ABF"
          }
        ]
      }
  7. Optional: Verify that Pulsar is configured correctly:

    curl --request GET \
      --url 'https://api.astra.datastax.com/v2/databases/DATABASE_ID/telemetry/metrics' \
      --header 'Accept: application/json' \
      --header 'Authorization: Bearer DATABASE_TOKEN' \
      --include \
      --verbose

    The response may have one of the following status codes:

    200 OK
    400 Bad request.
    403 The user is forbidden to perform the operation.
    404 The specified resource was not found.
    500 A server error occurred.

    Here are some example responses:

    {
      "pulsar": {
        "endpoint": "PULSAR_ENDPOINT_URL",
        "topic": "PULSAR_TOPIC",
        "auth_strategy": "token",
        "token": "PULSAR_TOKEN",
        "auth_name": "PULSAR_AUTH_NAME"
      }
    }
    {
      "pulsar": {
        "endpoint": "PULSAR_ENDPOINT_URL",
        "topic": "PULSAR_TOPIC",
        "auth_strategy": "oauth2",
        "oauth2_credentials_url": "PULSAR_OAUTH2_CREDENTIALS_URL",
        "oauth2_issuer_url": "PULSAR_OAUTH2_ISSUER_URL"
      }
    }
    {
      "errors": [
        {
          "description": "The name of the environment must be provided",
          "internalCode": "a1012",
          "internalTxId": "103B-A018-3898-0ABF"
        }
      ]
    }

Export metrics to Datadog

You can export metrics to Datadog using the Astra Portal or the DevOps API.

  • Astra Portal

  • DevOps API

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to view its details.

  3. Ensure that your database is in an Active status, and then select Settings from the dashboard.

  4. Scroll down to the Export Metrics section, and then select Add Destination.

  5. Select Datadog.

  6. Provide your API Key to authenticate to Datadog.

  7. Optional: Provide a Site. This is the Datadog site to send Astra DB metrics.

  8. Select Add Destination.

  9. The destination appears in the Export Metrics section.

For more details, see the Authentication topic in the Datadog documentation.

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to view its details.

  3. Copy the Database ID.

  4. If you don’t already have a token with the db-manage-thirdpartymetrics permission, select Generate Token to generate one.

  5. Get Datadog connection information from your administrator:

    1. api_key: The required API key so that your Astra DB metrics export operation can successfully authenticate into the Datadog API. For details, see Authentication in the Datadog documentation.

      Before submitting a DevOps call with the api_key value, you should validate that it’s correct by using the Validate API key curl command that’s described in the Datadog documentation.

    2. site: The Datadog site to which the exported Astra DB health metrics will be sent. For details, including the correct format to specify in the DevOps call, see Getting Started with Datadog Sites topic in the Datadog documentation.

      Datadog sites are named in different ways. See Access the Datadog site for important details. Summary:

      • If you’ll send Astra DB health metrics to a Datadog site prefixed with "app", remove both the "https://" protocol and the "app" prefix from the site parameter that you specify in the DevOps call.

      • If you’ll send Astra DB health metrics to a Datadog site that is prefixed with a subdomain such as "us5", remove only the "https://" protocol from the site parameter that you specify in the DevOps call.

      • Other Datadog site parameters are possible. See the table in Access the Datadog site for guidance on the appropriate site parameter format.

  6. Export metrics to Datadog using a POST request.

    Example POST payload for Datadog:
    
    curl --request POST \
      --url 'https://api.astra.datastax.com/v2/databases/DATABASE_ID/telemetry/metrics' \
      --header 'Accept: application/json' \
      --header 'Authorization: Bearer DATABASE_TOKEN' \
      --include \
      --verbose \
      --data '{
      "Datadog": {
        "api_key": "DATADOG_API_KEY",
        "site": "DATADOG_SITE"
      }
    }'

    The response may have one of the following status codes:

    202 OK
    400 Bad request.
    401 Unauthorized.
    403 The user is forbidden to perform the operation.
    404 The specified resource was not found.
    409 The request could not be processed because of conflict.
    5XX A server error occurred.

    Here is an example response:

    {
      "errors": [
        {
          "description": "The name of the environment must be provided",
          "internalCode": "a1012",
          "internalTxId": "103B-A018-3898-0ABF"
        }
      ]
    }
  7. Optional: Verify that Prometheus is configured correctly:

    curl --request GET \
      --url 'https://api.astra.datastax.com/v2/databases/DATABASE_ID/telemetry/metrics' \
      --header 'Accept: application/json' \
      --header 'Authorization: Bearer DATABASE_TOKEN' \
      --include \
      --verbose

    The response may have one of the following status codes:

    200 OK
    400 Bad request.
    403 The user is forbidden to perform the operation.
    404 The specified resource was not found.
    500 A server error occurred.

    Here are some example responses:

    {
      "datadog": {
        "api_key": "DATADOG_API_KEY",
        "site": "DATADOG_SITE"
      }
    }
    {
      "errors": [
        {
          "description": "The name of the environment must be provided",
          "internalCode": "a1012",
          "internalTxId": "103B-A018-3898-0ABF"
        }
      ]
    }

Manage an export destination

You can edit or delete an export destination.

  1. Open the Astra Portal and select Databases in the main navigation.

  2. Select a database name to open the database details.

  3. Go to Settings, and then scroll to the Export Metrics section.

  4. To edit a destination:

    1. Select more_vert More, and then select Edit.

    2. Make your changes as needed and select Update Destination.

  5. To delete a destination:

    1. Select more_vert More, and then select Delete.

    2. In the Delete Destination dialog, select Delete Destination to confirm that you want to delete the destination.

Metric definitions

The following metrics are forwarded by the Export Metrics feature. Each metric is an aggregated value, calculated once every 1 minute.

Every metric has two variants:

METRIC:rate1m

The rate of increase over a 1 minute interval.

METRIC:rate5m

The rate of increase over a 5 minute interval.

coordinator_rate_limited_requests_total:rate1m
coordinator_rate_limited_requests_total:rate5m

A calculated rate of change for the number of failed operations due to an Astra DB rate limit. Using these rates, alert if the value is greater than 0 for more than 30 minutes.

coordinator_read_requests_failures_total:rate1m
coordinator_read_requests_failures_total:rate5m

A calculated rate of change for the number of failed reads. Using these rates, alert if the value is greater than 0. Warn alert on low amount. High alert on larger amounts; determine potentially as a percentage of read throughput.

coordinator_read_requests_timeouts_total:rate1m
coordinator_read_requests_timeouts_total:rate5m

A calculated rate of change for read timeouts. Timeouts happen when operations against the database take longer than the server side timeout. Using these rates, alert if the value is greater than 0.

coordinator_read_requests_unavailables_total:rate1m
coordinator_read_requests_unavailables_total:rate5m

A calculated rate of change for reads where there were not enough data service replicas available to complete the request. Using these rates, alert if the value is greater than 0.

coordinator_write_requests_failures_total:rate1m
coordinator_write_requests_failures_total:rate5m

A calculated rate of change for the number of failed writes. Cassandra drivers retry failed operations, but significant failures can be problematic. Using these rates, alert if the value is greater than 0. Warn alert on low amount. High alert on larger amounts; determine potentially as a percentage of read throughput.

coordinator_write_requests_timeouts_total:rate1m
coordinator_write_requests_timeouts_total:rate5m

A calculated rate of change for timeouts, which occur when operations take longer than the server side timeout. Using these rates, compare with write_requests_failures.

coordinator_write_requests_unavailables_total:rate1m
coordinator_write_requests_unavailables_total:rate5m

A calculated rate of change for unavailable errors, which occur when the service is not available to service a particular request. Using these rates, compare with write_requests_failures.

coordinator_range_requests_failures_total:rate1m
coordinator_range_requests_failures_total:rate5m

A calculated rate of change for the number of range reads that failed. Cassandra drivers retry failed operations, but significant failures can be problematic. Using these rates, alert if the value is greater than 0. Warn alert on low amount. High alert on larger amounts; determine potentially as a percentage of read throughput.

coordinator_range_requests_timeouts_total:rate1m
coordinator_range_requests_timeouts_total:rate5m

A calculated rate of change for timeouts, which are a subset of total failures. Use this metric to understand if failures are due to timeouts. Using these rates, compare with range_requests_failures.

coordinator_range_requests_unavailables_total:rate1m
coordinator_range_requests_unavailables_total:rate5m

A calculated rate of change for unavailable errors, which are a subset of total failures. Use this metric to understand if failures are due to timeouts. Using these rates, compare with range_requests_failures.

coordinator_write_latency_seconds_count:rate1m
coordinator_write_latency_seconds_count:rate5m

A calculated rate of change for write throughput. Alert based on your application service level objective (SLO).

coordinator_write_latency_seconds_bucket:rate1m
coordinator_write_latency_seconds_bucket:rate5m

A calculated rate of change for write latency, where $QUANTILE is a histogram quantile of 99, 95, 90, 75, or 50 (e.g. astra_db_write_latency_seconds_P99:rate1m). Alert based on your application SLO.

coordinator_write_requests_mutation_size_bytes_bucket
coordinator_write_requests_mutation_size_bytes_bucket

A calculated rate of change for how big writes are over time, where $QUANTILE is a histogram quantile of 99, 95, 90, 75, or 50. For example, astra_db_write_requests_mutation_size_bytesP99:rate5m.

coordinator_read_latency_seconds_count:rate1m
coordinator_read_latency_seconds_count:rate5m

A calculated rate of change for read latency. Alert based on your application SLO.

coordinator_read_latency_seconds_bucket:rate1m
coordinator_read_latency_seconds_bucket:rate5m

A calculated rate of change read latency, where $QUANTILE is a histogram quantile of 99, 95, 90, 75, or 50. For example, astra_db_read_latency_secondsP99:rate1m. Alert based on your application SLO.

coordinator_range_latency_seconds_count:rate1m
coordinator_range_latency_seconds_count:rate5m

A calculated rate of change for range read throughput. Alert based on your application SLO.

coordinator_range_latency_seconds_bucket:rate1m
coordinator_range_latency_seconds_bucket:rate5m

A calculated rate of change of range read for latency, where $QUANTILE is a histogram quantile of 99, 95, 90, 75, or 50. For example, astra_db_range_latency_secondsP99. Alert based on your application SLO.

table_tombstone_read_counter_total:rate1m
table_tombstone_read_counter_total:rate5m

A calculated rate of change for the total number of tombstone reads. Tombstones are markers of deleted records or certain updates (for example collection updates). Monitoring the rate of tombstone reads can help in identifying potential performance impacts. Using these rates, alert if the value shows significant growth.

table_tombstone_read_failures_total:rate1m
table_tombstone_read_failures_total:rate5m

A calculated rate of change for the total number of read operations that failed due to hitting the the tombstone guardrail failure threshold. This metric is critical for identifying issues potentially leading to performance degradation or timeouts. Alert if the value is greater than 0.

table_tombstone_read_warnings_total:rate1m
table_tombstone_read_warnings_total:rate5m

A calculated rate of change for the total number of warnings generated due to getting close to the tombstone guardrail failure threshold. This metric helps in identifying scenarios where read operations are slowed or at risk of slowing. Alert on a significant increase as it may indicate potential read performance issues.

coordinator_cas_write_latency_seconds_count:rate1m
coordinator_cas_write_latency_seconds_count:rate5m

A calculated rate of change for the count of CAS (Compare-And-Swap) write operations in Lightweight Transactions (LWTs), measuring the throughput of CAS writes. CAS operations are used for atomic read-modify-write operations. Monitoring the rate of these operations helps in understanding the load and performance characteristics of CAS writes. Alert if the rate significantly deviates from expected patterns, indicating potential concurrency or contention issues.

coordinator_cas_write_latency_seconds_bucket:rate1m
coordinator_cas_write_latency_seconds_bucket:rate5m

A calculated rate of change for CAS write latency distributions in LWTs across predefined latency buckets. This metric provides insights into the latency characteristics of CAS write operations, helping identify latency spikes or trends over time. Alert based on application SLOs, particularly if high-latency buckets show increased counts.

coordinator_cas_read_latency_seconds_count:rate1m
coordinator_cas_read_latency_seconds_count:rate5m

A calculated rate of change for the count of CAS read operations in LWTs, measuring the throughput of CAS reads. Monitoring this rate is important for understanding the load and performance of read operations that involve conditional checks. Alert on unusual changes, which could signal issues with data access patterns or performance bottlenecks.

coordinator_cas_read_latency_seconds_bucket:rate1m
coordinator_cas_read_latency_seconds_bucket:rate5m

A calculated rate of change for CAS read latency distributions in LWTs across predefined latency buckets. This metric aids in identifying the latency performance of CAS reads, essential for diagnosing potential issues in read performance or understanding the distribution of read operation latencies. Alert if latency distribution shifts towards higher buckets, indicating potential performance issues.

coordinator_cas_write_unfinished_commit_total:rate1m
coordinator_cas_write_unfinished_commit_total:rate5m

A calculated rate of change for the total number of CAS write operations in LWTs that did not finish committing. This metric is crucial for detecting issues in the atomicity of write operations, potentially caused by network or node failures. Alert if there’s an increase, as it could impact data consistency.

coordinator_cas_write_contention_total_bucket:rate1m
coordinator_cas_write_contention_total_bucket:rate5m

A calculated rate of change for the distribution of CAS write contention in LWTs across predefined buckets. Contention during CAS write operations can significantly impact performance. This metric helps in understanding and diagnosing the levels of contention affecting CAS writes. Alert on significant increases in higher contention buckets.

coordinator_cas_read_unfinished_commit_total:rate1m
coordinator_cas_read_unfinished_commit_total:rate5m

A calculated rate of change for the total number of CAS read operations that encountered unfinished commits. Monitoring this metric is important for identifying issues with read consistency and potential data visibility problems. Alert if there’s an increase, indicating problems with the completion of write operations.

coordinator_cas_read_contention_total_bucket:rate1m
coordinator_cas_read_contention_total_bucket:rate5m

A calculated rate of change for the distribution of CAS read contention in LWTs across predefined buckets. Contention during CAS reads can indicate performance issues or high levels of concurrent access to the same data. Alert on shifts towards higher contention buckets, indicating a need for investigation and potential optimization.

coordinator_cas_read_requests_failures_total:rate1m
coordinator_cas_read_requests_failures_total:rate5m

A calculated rate of change for the total number of CAS read operations in LWTs that failed. Failures in CAS reads can signal issues with data access or consistency problems. Alert if the rate increases, indicating potential issues affecting the reliability of CAS reads.

coordinator_cas_read_requests_timeouts_total:rate1m
coordinator_cas_read_requests_timeouts_total:rate5m

A calculated rate of change for the number of CAS read operations in LWTs that timed out. Timeouts can indicate system overload or issues with data access patterns. Monitoring this metric helps in identifying and addressing potential bottlenecks. Alert if there’s an increase in timeouts.

coordinator_cas_read_requests_unavailables_total:rate1m
coordinator_cas_read_requests_unavailables_total:rate5m

A calculated rate of change for CAS read operations in LWTs that were unavailable. This metric is vital for understanding the availability of the system to handle CAS reads. An increase in unavailability can indicate cluster health issues. Alert if the rate increases.

coordinator_cas_write_requests_failures_total:rate1m
coordinator_cas_write_requests_failures_total:rate5m

A calculated rate of change for the total number of CAS write operations in LWTs that failed. Failure rates for CAS writes are critical for assessing the reliability and performance of write operations. Alert if there’s a significant increase in failures.

coordinator_cas_write_requests_timeouts_total:rate1m
coordinator_cas_write_requests_timeouts_total:rate5m

A calculated rate of change for the number of CAS write operations in LWTs that timed out. Write timeouts can significantly impact application performance and user experience. Monitoring this rate is crucial for maintaining system performance. Alert on an upward trend in timeouts.

coordinator_cas_write_requests_unavailables_total:rate1m
coordinator_cas_write_requests_unavailables_total:rate5m

A calculated rate of change for CAS write operations in LWTs that were unavailable. Increases in this metric can indicate problems with cluster capacity or health, impacting the ability to perform write operations. Alert if there’s an increase, as it could signal critical availability issues.

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com