Production-Ready Template

Effective PostgreSQL Monitoring with Prometheus Alert Templates

PostgreSQL is a powerful open-source relational database used across a wide range of applications. For SREs and DevOps teams, ensuring its availability, performance, and reliability is critical. Prometheus, combined with well-designed alerting rules, can help detect production issues in PostgreSQL before they cause major incidents. This guide introduces a set of community-curated Prometheus alert templates for PostgreSQL from the DrDroidLab GitHub repository, walks through individual alert rules, and offers guidance on adapting them for real-world production environments.

Core Alert Rule

PostgreSQL is down
Critical Performance Bottleneck
pg_up == 0
Why this matters
This alert fires when the PostgreSQL exporter is no longer available, indicating the instance may be down or unreachable.
Tuning tips
Ensure scrape targets are healthy and not flapping. Consider using an 'absent()' check if no 'pg_up' metric is emitted.
PostgreSQL exporter is down
Operations blocking event loop
absent(pg_up)
Why this matters
This alert triggers when the pg_up metric itself is missing, indicating the exporter may not be emitting metrics, or Prometheus is not scraping it.
Tuning tips
Check Prometheus scrape configurations and validate exporter startup logs. May indicate a misconfiguration or container crash.
PostgreSQL is in recovery state
Memory efficiency warning
pg_is_in_recovery{job="postgres"} == 1
Why this matters
This alert identifies if a PostgreSQL instance is running in standby or recovery mode, typically expected only in replicas.
Tuning tips
Set this alert only for master nodes or production roles. Use label matchers to exclude replicas intentionally in recovery.
PostgreSQL High Rollback Rate
Service availability check
increase(pg_stat_database_xact_rollback{job="postgres"}[5m]) / ignoring(db) group_right increase(pg_stat_database_xact_commit{job="postgres"}[5m]) > 0.02
Why this matters
This rule fires if the rollback rate is above 2% over a 5-minute window, which can indicate excessive failed transactions or application bugs.
Tuning tips
Tune the rollback/commit ratio (0.02) and window (5m) based on normal application behavior. Some workloads may have naturally high rollback rates.
PostgreSQL Saturated by Too Many Connections
Service availability check
pg_stat_activity_count{job="postgres"} >= pg_settings_max_connections{job="postgres"} - 10
Why this matters
Triggers when available PostgreSQL connections are critically low, potentially leading to refused client connections.
Tuning tips
Update thresholds depending on the 'max_connections' and typical connection usage. Consider using pooling (e.g., PgBouncer) if this is common.
Service availability check
Why this matters
Tuning tips
No setup steps
Service availability check
Why this matters
Tuning tips
Service availability check
Why this matters
Tuning tips

Quick Setup

1
2
3
4
5

Frequently Asked Questions

What does 'pg_up == 0' mean?
When should I alert on recovery mode?
How do I adjust the rollback alert for my application?

Ready to Get Started?

Get started by cloning the PostgreSQL alert templates from the DrDroidLab GitHub repository. Enhance your observability and catch PostgreSQL issues before they impact your users.

SOC 2 Type II
certifed
ISO 27001
certified
Deep Sea Tech Inc. — Made with ❤️ in Bangalore & San Francisco 🏢

Doctor Droid