Prometheus Label cardinality explosion
Too many unique label combinations causing high cardinality.
Stuck? Let AI directly find root cause
AI that integrates with your stack & debugs automatically | Runs locally and privately
What is Prometheus Label cardinality explosion
Understanding Prometheus and Its Purpose
Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It is designed to record real-time metrics in a time series database, built using a highly dimensional data model. Prometheus is a powerful tool for monitoring and alerting, providing a flexible query language to leverage its multi-dimensional data model.
Recognizing the Symptom: Label Cardinality Explosion
One of the common issues encountered when using Prometheus is a label cardinality explosion. This problem manifests as high memory usage and slow query performance. Users may notice that their Prometheus server is consuming an excessive amount of resources, or queries are taking longer than expected to execute.
What is Observed?
When label cardinality explosion occurs, you might observe the following symptoms:
Increased memory consumption by the Prometheus server. Slower query execution times. Potential out-of-memory (OOM) errors.
Explaining the Issue: High Cardinality
Label cardinality explosion happens when there are too many unique label combinations in your metrics. Prometheus stores each unique combination of labels as a separate time series. If you have labels that generate a large number of unique combinations, it can lead to high cardinality, which in turn causes performance issues.
Root Cause Analysis
The root cause of this issue is often the use of labels that have a high number of unique values, such as user IDs, request IDs, or other identifiers that change frequently. These labels can exponentially increase the number of time series stored in Prometheus.
Steps to Fix the Issue
To resolve the label cardinality explosion, follow these steps:
1. Identify High-Cardinality Labels
Use the following query to identify metrics with high cardinality:
count by (__name__)({__name__=~".*"})
This query will help you identify which metrics have a large number of time series.
2. Reduce Unique Labels
Once you've identified the problematic metrics, consider reducing the number of unique labels. Avoid using labels with high cardinality such as user IDs or request IDs. Instead, use labels that have a limited set of possible values.
3. Use Relabeling Rules
Implement relabeling rules in your prometheus.yml configuration file to drop or modify labels that contribute to high cardinality. For example:
relabel_configs: - source_labels: ["__name__"] regex: "high_cardinality_metric" action: drop
For more information on relabeling, refer to the Prometheus documentation.
4. Monitor and Adjust
After making changes, monitor your Prometheus server's performance and adjust your configuration as needed. Continuously review your metrics and labels to ensure they remain efficient.
Conclusion
By understanding and addressing label cardinality explosion, you can significantly improve the performance and reliability of your Prometheus monitoring setup. Regularly review your metrics and labels, and make use of Prometheus's powerful configuration options to maintain an efficient monitoring system.
Prometheus Label cardinality explosion
TensorFlow
- 80+ monitoring tool integrations
- Long term memory about your stack
- Locally run Mac App available
Time to stop copy pasting your errors onto Google!