Production-Ready Template

Effective PostgreSQL Monitoring with Prometheus Alert Templates

PostgreSQL is a powerful open-source relational database used across a wide range of applications. For SREs and DevOps teams, ensuring its availability, performance, and reliability is critical. Prometheus, combined with well-designed alerting rules, can help detect production issues in PostgreSQL before they cause major incidents. This guide introduces a set of community-curated Prometheus alert templates for PostgreSQL from the DrDroidLab GitHub repository, walks through individual alert rules, and offers guidance on adapting them for real-world production environments.

Get Template

Core Alert Rule

PostgreSQL is down

Critical Performance Bottleneck

pg_up == 0

Why this matters

This alert fires when the PostgreSQL exporter is no longer available, indicating the instance may be down or unreachable.

Tuning tips

Ensure scrape targets are healthy and not flapping. Consider using an 'absent()' check if no 'pg_up' metric is emitted.

PostgreSQL exporter is down

Operations blocking event loop

absent(pg_up)

Why this matters

This alert triggers when the pg_up metric itself is missing, indicating the exporter may not be emitting metrics, or Prometheus is not scraping it.

Tuning tips

Check Prometheus scrape configurations and validate exporter startup logs. May indicate a misconfiguration or container crash.

PostgreSQL is in recovery state

Memory efficiency warning

pg_is_in_recovery{job="postgres"} == 1

Why this matters

This alert identifies if a PostgreSQL instance is running in standby or recovery mode, typically expected only in replicas.

Tuning tips

Set this alert only for master nodes or production roles. Use label matchers to exclude replicas intentionally in recovery.

PostgreSQL High Rollback Rate

Service availability check

increase(pg_stat_database_xact_rollback{job="postgres"}[5m]) / ignoring(db) group_right increase(pg_stat_database_xact_commit{job="postgres"}[5m]) > 0.02

Why this matters

This rule fires if the rollback rate is above 2% over a 5-minute window, which can indicate excessive failed transactions or application bugs.

Tuning tips

Tune the rollback/commit ratio (0.02) and window (5m) based on normal application behavior. Some workloads may have naturally high rollback rates.

PostgreSQL Saturated by Too Many Connections

Service availability check

pg_stat_activity_count{job="postgres"} >= pg_settings_max_connections{job="postgres"} - 10

Why this matters

Triggers when available PostgreSQL connections are critically low, potentially leading to refused client connections.

Tuning tips

Update thresholds depending on the 'max_connections' and typical connection usage. Consider using pooling (e.g., PgBouncer) if this is common.

Service availability check

Why this matters

Tuning tips

No setup steps

Service availability check

Why this matters

Tuning tips

Service availability check

Why this matters

Tuning tips

Quick Setup

Frequently Asked Questions

Ready to Get Started?

Get started by cloning the PostgreSQL alert templates from the DrDroidLab GitHub repository. Enhance your observability and catch PostgreSQL issues before they impact your users.

Get Template