Managing Events and Alerts

Using these methods, you can get information about log events, such as node compactions and repairs triggered through OpsCenter, and configure alert thresholds for a number of database metrics.

Event and Alert Methods URL

Retrieve OpsCenter events.

Retrieve configured alert rules.

Retrieve a specific alert rule.

Create a new alert rule.

Update an alert rule.

Delete an alert rule.

Retrieve active alerts.

Event Methods

GET /{cluster_id}/events

Retrieve historical events logged by OpsCenter.

Path arguments:

Query params:

  • count*: The number of events to return. Defaults to 10.

    • timestamp: A timestamp specifying the point in time to start retrieving events. Specified as a unix timestamp in microseconds. Defaults to the current time.

    • reverse: A boolean (0 or 1) indicating whether to retrieve events in reverse order. Defaults to 1 (true). Events are retrieved starting from the time specified by the timestamp and going backward in time until count events are found or there are no more events to retrieve.

Returns a list of dictionaries where each dictionary represents an event. An event dictionary contains properties describing that event.

Example:

 curl http://127.0.0.1:8888/Test_Cluster/events?count=1

Output:

    {
      "action": 28,
      "api_source_ip": 192.168.1.12,
      "event_source": "OpsCenter",
      "level": 1,
      "level_str": "INFO",
      "message": "Restarting node 192.168.100.3",
      "source_node": 192.168.100.3,
      "success": null,
      "target_node": null,
      "time": "1334768517145625",
      "user": joe
    }

Alert Methods

GET /{cluster_id}/alert-rules

Retrieve a list of configured alert rules in OpsCenter.

Path arguments:

Returns a list of [response-alert-rule] objects.

AlertRule

    {
      "id": <value>,
      "type": <value>,
      "threshold": <value>,
      "comparator": <value>,
      "duration": <value>,
      "notify_interval": <value>,
      "enabled": <value>,
      "metric": <value>,
      "cf": <value>,
      "item": <value>,
      "dc": <value>
    }

This table describes the property values of an alertrule object:

Property

Type

Description of Values

id

String

A unique ID that references an alert rule. Use only for retrieving alert rules.

type

String

The event or metric aggregation that triggers an alert. Accepted values include rolling-avg, cluster-balance, and node-down. This field is not editable.

threshold

Float

The metric boundary that triggers an alert when the threshold is crossed. Applicable only when the type is rolling-avg.

comparator

String Optional.

Values are < or >.

duration

Int

How long (in minutes) the problem continues before firing the alert.

notify_interval

Int

How often (in minutes) to repeat the alert. Use 0 for a single notification.

enabled

Int

The state of the alert. Values are 0 (disabled) or 1 (enabled).

metric

String

A key from list of metrics<metrics-keys>. This field is only valid if the type is rolling-avg.

cf

String Optional.

The table to monitor if the metric property is one of the cf-keys.

item

String Optional.

The device to monitor if the metric is one of the os-keys.

dc

String Optional.

The name of the data center that contains nodes to be monitored. If omitted, all nodes are monitored.

Example:

 curl http://127.0.0.1:8888/Test_Cluster/alert-rules

Output:

    [
      {
        "comparator": ">",
        "dc": "us-east",
        "duration": 1.0,
        "enabled": 1,
        "id": "e0c356c7-62ff-4aa8-9b17-e305f101b69a",
        "metric": "write-latency",
        "notify_interval": 1.0,
        "threshold": 10000.0,
        "type": "rolling-avg"
      },
      ...
    ]

GET /{cluster_id}/alert-rules/{alert_id}

Retrieve a specific alert rule.

Path arguments:

Returns an AlertRule.

Example:

 curl http://127.0.0.1:8888/Test_Cluster/alert-rules/e0c356c7-62ff-4aa8-9b17-e305f101b69a

Output:

    {
      "comparator": ">",
      "dc": "us-east",
      "duration": 1.0,
      "enabled": 1,
      "id": "e0c356c7-62ff-4aa8-9b17-e305f101b69a",
      "metric": "write-latency",
      "notify_interval": 1.0,
      "threshold": 10000.0,
      "type": "rolling-avg"
    }

POST /{cluster_id}/alert-rules

Create a new alert rule.

Path arguments:

Body: A dictionary in the format of alertrule describing the alert to create.

Response: 201: Alert rule was created successfully.

Returns the ID of the newly created alert.

Example:

 curl -X POST
   http://127.0.0.1:8888/Test_Cluster/alert-rules
   -d '{
     "comparator": ">",
     "dc": "",
     "duration": 60.0,
     "enabled": 1,
     "metric": "heap-used",
     "notify_interval": 5.0,
     "threshold": 6291456000.0,
     "type": "rolling-avg"
   }'

Output:

  "b375fd3e-3908-4be5-ae37-d8f3b8699a9f"

PUT /{cluster_id}/alert-rules/{alert_id}

Update an existing alert rule.

Path arguments:

  • cluster_id: The ID of a cluster returned from GET /cluster-configs.

  • alert_id: A UUID that identifies a specific alert rule and has the value of an id property returned by`get-alert-rules`.

Body: A dictionary of fields from alertrule to update.

Response: 200: Alert rule updated successfully.

Example:

 curl -X PUT
   http://127.0.0.1:8888/Test_Cluster/alert-rules/b375fd3e-3908-4be 5-ae37-d8f3b8699a9f
   -d '{"duration": 120.0}'

DELETE /{cluster_id}/alert-rules/{alert_id}

Delete an existing alert rule.

Path arguments:

  • cluster_id: The ID of a cluster returned from GET /cluster-configs.

  • alert_id: A UUID that identifies a specific alert rule and has the value of an id property returned by`get-alert-rules`.

Response: 200: Alert rule removed successfully.

Example:

 curl -X DELETE
   http://127.0.0.1:8888/Test_Cluster/alert-rules/b375fd3e-3908-4be 5-ae37-d8f3b8699a9f

GET /{cluster_id}/alerts/fired

Get all alerts which are currently fired.

Path arguments:

Returns a list of alerts that have been triggered. Each item in the list is a dictionary describing the triggered alert.

Example:

 curl http://127.0.0.1:8888/Test_Cluster/alerts/fired

Output:

  [
    {
      "alert_rule_id": "ca4cf071-03bd-486a-a8be-428e6cd7218a",
      "current_value": 31676303.333333332,
      "first_fired": 1336669233,
      "node": "10.11.12.150"
    },
    {
      "alert_rule_id": "ca4cf071-03bd-486a-a8be-428e6cd7218a",
      "current_value": 28380117.5,
      "first_fired": 1336669233,
      "node": "10.11.12.152"
    }
  ]

Was this helpful?

Give Feedback

How can we improve the documentation?

© 2024 DataStax | Privacy policy | Terms of use

Apache, Apache Cassandra, Cassandra, Apache Tomcat, Tomcat, Apache Lucene, Apache Solr, Apache Hadoop, Hadoop, Apache Pulsar, Pulsar, Apache Spark, Spark, Apache TinkerPop, TinkerPop, Apache Kafka and Kafka are either registered trademarks or trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries. Kubernetes is the registered trademark of the Linux Foundation.

General Inquiries: +1 (650) 389-6000, info@datastax.com