Alerts
All alerts contain information about each captured event.
Optionally, you can configure Mission Control to send alerts for selected levels of events or specific clusters.
Embedded alert plugins
Mission Control provides support for routing alerts to Slack channels.
Default alerts
A severity label marks the criticality value of an alert.
The three values can be:
-
critical
: Requires immmediate action. -
warning
: Requires eventual but not urgent action. -
info
: Marks something out of the ordinary that doesn’t necessarily require action.
Description | Severity | Database type [1] | Details |
---|---|---|---|
Node down for more than 10 minutes |
Sev 2 - Warning |
All |
Source metric: |
Node down for 30 minutes |
Sev 1 - Error |
All |
Source metric: |
Nodes down in different racks of same datacenter |
All |
Two nodes down across rack boundaries can lead to LOCAL_QUORUM CL errors in applications. Source metric: |
|
CPU above 80% for 5 minutes |
All |
An error that, if triggered too often, indicates low disk space and that the cluster should be scaled. Source metric: |
|
Used disk space above 50% for one minute |
All |
||
Used disk space above 75% for one minute |
Sev 1 - Error |
All |
A signal to expand the cluster before it gets into a state where cleanups are impossible due to insufficient disk space. Source metric: |
Used disk space above 50% for one minute |
Sev 2 - Warning |
All |
A signal to expand the cluster before it gets into a state where cleanups are impossible due to insufficient disk space. Source metric: |
Load average above 20 for 5 minutes |
Sev 2 - Warning |
All |
Good indicator for performance issues, the root cause of which can vary. Source metric: |
Load average above 32 for 5 minutes |
Sev 1 - Error |
All |
Good indicator for performance issues, the root cause of which can vary. Source metric: |
Dropped messages over 5 minutes |
Sev 1 - Error for >= 10,000 + Sev 2 - Warning for < 10,000 |
All |
Thread pools cannot keep up with the pace of queries entering and being processed within the cluster.
This leads to errors within the application stack and potentially incorrect replicas.
Source metric: |