Managing Events and Alerts

Using these methods, you can get information about log events, such as node compactions and repairs triggered through OpsCenter, and configure alert thresholds for a number of database metrics.

Event and Alert Methods URL

Retrieve OpsCenter events.

Retrieve configured alert rules.

Retrieve a specific alert rule.

Create a new alert rule.

Update an alert rule.

Delete an alert rule.

Retrieve active alerts.

Event Methods

GET /{cluster_id}/events

Retrieve historical events logged by OpsCenter.

Path arguments:

Query params:

  • count*: The number of events to return. Defaults to 10.

    • timestamp: A timestamp specifying the point in time to start retrieving events. Specified as a unix timestamp in microseconds. Defaults to the current time.

    • reverse: A boolean (0 or 1) indicating whether to retrieve events in reverse order. Defaults to 1 (true). Events are retrieved starting from the time specified by the timestamp and going backward in time until count events are found or there are no more events to retrieve.

Returns a list of dictionaries where each dictionary represents an event. An event dictionary contains properties describing that event.

Example:

 curl http://127.0.0.1:8888/Test_Cluster/events?count=1

Output:

    {
      "action": 28,
      "api_source_ip": 192.168.1.12,
      "event_source": "OpsCenter",
      "level": 1,
      "level_str": "INFO",
      "message": "Restarting node 192.168.100.3",
      "source_node": 192.168.100.3,
      "success": null,
      "target_node": null,
      "time": "1334768517145625",
      "user": joe
    }

Alert Methods

GET /{cluster_id}/alert-rules

Retrieve a list of configured alert rules in OpsCenter.

Path arguments:

Returns a list of [response-alert-rule] objects.

AlertRule

    {
      "id": <value>,
      "type": <value>,
      "threshold": <value>,
      "comparator": <value>,
      "duration": <value>,
      "notify_interval": <value>,
      "enabled": <value>,
      "metric": <value>,
      "cf": <value>,
      "item": <value>,
      "dc": <value>
    }

This table describes the property values of an alertrule object:

Property

Type

Description of Values

id

String

A unique ID that references an alert rule. Use only for retrieving alert rules.

type

String

The event or metric aggregation that triggers an alert. Accepted values include rolling-avg, cluster-balance, and node-down. This field is not editable.

threshold

Float

The metric boundary that triggers an alert when the threshold is crossed. Applicable only when the type is rolling-avg.

comparator

String Optional.

Values are < or >.

duration

Int

How long (in minutes) the problem continues before firing the alert.

notify_interval

Int

How often (in minutes) to repeat the alert. Use 0 for a single notification.

enabled

Int

The state of the alert. Values are 0 (disabled) or 1 (enabled).

metric

String

A key from list of metrics<metrics-keys>. This field is only valid if the type is rolling-avg.

cf

String Optional.

The table to monitor if the metric property is one of the cf-keys.

item

String Optional.

The device to monitor if the metric is one of the os-keys.

dc

String Optional.

The name of the data center that contains nodes to be monitored. If omitted, all nodes are monitored.

Example:

 curl http://127.0.0.1:8888/Test_Cluster/alert-rules

Output:

    [
      {
        "comparator": ">",
        "dc": "us-east",
        "duration": 1.0,
        "enabled": 1,
        "id": "e0c356c7-62ff-4aa8-9b17-e305f101b69a",
        "metric": "write-latency",
        "notify_interval": 1.0,
        "threshold": 10000.0,
        "type": "rolling-avg"
      },
      ...
    ]

GET /{cluster_id}/alert-rules/{alert_id}

Retrieve a specific alert rule.

Path arguments:

Returns an AlertRule.

Example:

 curl http://127.0.0.1:8888/Test_Cluster/alert-rules/e0c356c7-62ff-4aa8-9b17-e305f101b69a

Output:

    {
      "comparator": ">",
      "dc": "us-east",
      "duration": 1.0,
      "enabled": 1,
      "id": "e0c356c7-62ff-4aa8-9b17-e305f101b69a",
      "metric": "write-latency",
      "notify_interval": 1.0,
      "threshold": 10000.0,
      "type": "rolling-avg"
    }

POST /{cluster_id}/alert-rules

Create a new alert rule.

Path arguments:

Body: A dictionary in the format of alertrule describing the alert to create.

Response: 201: Alert rule was created successfully.

Returns the ID of the newly created alert.

Example:

 curl -X POST
   http://127.0.0.1:8888/Test_Cluster/alert-rules
   -d '{
     "comparator": ">",
     "dc": "",
     "duration": 60.0,
     "enabled": 1,
     "metric": "heap-used",
     "notify_interval": 5.0,
     "threshold": 6291456000.0,
     "type": "rolling-avg"
   }'

Output:

  "b375fd3e-3908-4be5-ae37-d8f3b8699a9f"

PUT /{cluster_id}/alert-rules/{alert_id}

Update an existing alert rule.

Path arguments:

  • cluster_id: The ID of a cluster returned from GET /cluster-configs.

  • alert_id: A UUID that identifies a specific alert rule and has the value of an id property returned by`get-alert-rules`.

Body: A dictionary of fields from alertrule to update.

Response: 200: Alert rule updated successfully.

Example:

 curl -X PUT
   http://127.0.0.1:8888/Test_Cluster/alert-rules/b375fd3e-3908-4be 5-ae37-d8f3b8699a9f
   -d '{"duration": 120.0}'

DELETE /{cluster_id}/alert-rules/{alert_id}

Delete an existing alert rule.

Path arguments:

  • cluster_id: The ID of a cluster returned from GET /cluster-configs.

  • alert_id: A UUID that identifies a specific alert rule and has the value of an id property returned by`get-alert-rules`.

Response: 200: Alert rule removed successfully.

Example:

 curl -X DELETE
   http://127.0.0.1:8888/Test_Cluster/alert-rules/b375fd3e-3908-4be 5-ae37-d8f3b8699a9f

GET /{cluster_id}/alerts/fired

Get all alerts which are currently fired.

Path arguments:

Returns a list of alerts that have been triggered. Each item in the list is a dictionary describing the triggered alert.

Example:

 curl http://127.0.0.1:8888/Test_Cluster/alerts/fired

Output:

  [
    {
      "alert_rule_id": "ca4cf071-03bd-486a-a8be-428e6cd7218a",
      "current_value": 31676303.333333332,
      "first_fired": 1336669233,
      "node": "10.11.12.150"
    },
    {
      "alert_rule_id": "ca4cf071-03bd-486a-a8be-428e6cd7218a",
      "current_value": 28380117.5,
      "first_fired": 1336669233,
      "node": "10.11.12.152"
    }
  ]