Prometheus Alert Rule Generator

Build a Prometheus alerting rules file from proven presets — instance down, high CPU, memory pressure, low disk, request latency, error rate, and pod crash-looping. Toggle the alerts you want, tune each threshold, the for duration, and the severity label, and the tool writes a valid rules YAML with PromQL expressions and templated annotations. Copy it into your Prometheus config. Everything is generated in your browser.

On Alert Threshold For Severity
alerts.yml

Expressions assume the common node_exporter and HTTP/Kubernetes metric names. Confirm the metric names match your exporters, then load the file via a rule_files entry in prometheus.yml and route the severities in Alertmanager.

How to use the Prometheus Alert Rule Generator

Tick the alerts you want in the table. For each, set the threshold (a percentage for resource alerts, seconds for latency), the for duration — how long the condition must hold before firing, which suppresses brief spikes — and a severity label that Alertmanager uses for routing. The YAML regenerates live; save it to a file such as alerts.yml and reference it from the rule_files list in prometheus.yml.

Check the metric names against your setup before relying on the rules. The CPU, memory, and disk expressions use node_exporter metrics; the latency and error-rate alerts assume an HTTP histogram named http_request_duration_seconds and a counter http_requests_total with a status label; the crash-loop alert uses kube-state-metrics. Rename them to match your exporters and instrumentation. After loading, verify the rules under Status → Rules in the Prometheus UI and test routing by triggering a low-severity alert.

How Prometheus alerting rules work

Prometheus separates detecting a problem from notifying someone about it. Alerting rules, defined in a YAML file Prometheus loads, continuously evaluate PromQL expressions against your metrics; when an expression returns results, an alert becomes active. A companion service, Alertmanager, receives those active alerts and handles grouping, silencing, deduplication, and delivery to email, Slack, PagerDuty, and the like. This generator produces the rules half; Alertmanager configuration is separate.

Each rule has four important parts. The expr is the PromQL condition — for example CPU utilisation above a threshold or up == 0 for a target that has stopped responding. The for clause requires the condition to stay true for a duration before the alert actually fires, which filters out momentary spikes that would otherwise page someone needlessly. labels attach metadata, most importantly a severity, that Alertmanager routes on. annotations carry human-readable text, and they can use Go templating like {{ $labels.instance }} and {{ $value }} to embed the offending instance and the current metric value directly in the message.

Good alerts are symptom-based and actionable. Alerting on user-visible symptoms — high error rate, slow requests, a down instance — rather than on every internal fluctuation keeps noise low and signal high, and a sensible for duration prevents flapping. The presets here follow that philosophy: thresholds that indicate a real problem, durations long enough to ignore transient blips, and severities that distinguish a page-now critical from a look-soon warning. Start from them, tune the numbers to your environment's normal behaviour, and expand with service-specific rules over time.

Common use cases

  • Bootstrapping monitoring. Get a solid baseline of infrastructure alerts without writing PromQL from scratch.
  • Standard thresholds. Apply consistent CPU, memory, and disk alerts across many hosts.
  • Learning alerting rules. See how expr, for, labels, and annotations combine in valid syntax.
  • Reducing noise. Add for-durations to flappy alerts that currently fire on brief spikes.

Frequently asked questions

How do I load the rules into Prometheus?

Save the YAML to a file and add its path to the rule_files list in prometheus.yml, then reload Prometheus (send SIGHUP or hit the /-/reload endpoint if enabled). Confirm the rules loaded under Status → Rules in the web UI, where you can also see which are currently firing.

What does the for clause do?

It requires the alert expression to remain true continuously for the given duration before the alert transitions from pending to firing. This suppresses short-lived spikes — a CPU blip that clears in seconds will not page anyone — at the cost of a small delay in genuine alerts. Choose a duration shorter than the time you have to react.

Will these rules send notifications by themselves?

No. Prometheus only evaluates the rules and marks alerts as firing. Actual notifications are sent by Alertmanager, which you configure separately to route alerts by their labels (such as severity) to channels like email, Slack, or PagerDuty. This tool generates the rules, not the Alertmanager config.

Do the metric names match my setup?

The presets use widely used names — node_exporter for host metrics, kube-state-metrics for Kubernetes, and conventional http_requests_total / http_request_duration_seconds for application metrics. If your exporters or instrumentation use different names or labels, edit the expressions accordingly before relying on them.

How should I pick thresholds?

Base them on your environment's normal behaviour, not generic numbers. Look at historical graphs to see typical and peak values, then set the threshold where a sustained breach indicates a real problem. The defaults here are reasonable starting points; tune them so the alert fires when action is genuinely needed and stays quiet otherwise.