Seldon Core Metrics not being collected

Prometheus not scraping the model metrics endpoint.

Understanding Seldon Core

Seldon Core is an open-source platform designed to deploy machine learning models on Kubernetes. It provides a robust framework for scaling, monitoring, and managing machine learning models in production environments. One of its key features is the ability to integrate with monitoring tools like Prometheus to collect metrics from deployed models.

Identifying the Symptom

When using Seldon Core, you might encounter a situation where metrics are not being collected as expected. This can manifest as missing data in your monitoring dashboards or alerts not being triggered due to the absence of metrics.

Common Observations

  • Empty or incomplete metrics data in Prometheus.
  • Grafana dashboards showing no data or gaps in data.
  • Alerts based on metrics not firing.

Exploring the Issue

The root cause of metrics not being collected is often due to Prometheus not scraping the model metrics endpoint. This can happen if the endpoint is not correctly exposed or if Prometheus is not configured to scrape it.

Why This Happens

Prometheus relies on a configuration file to determine which endpoints to scrape for metrics. If the Seldon Core model's metrics endpoint is not included in this configuration, or if there are network issues preventing access, metrics will not be collected.

Steps to Fix the Issue

To resolve this issue, follow these steps to ensure that Prometheus is correctly configured to scrape the Seldon Core model metrics endpoint.

Step 1: Verify Endpoint Exposure

Ensure that the model's metrics endpoint is exposed. You can check this by accessing the endpoint directly in your browser or using a tool like curl:

curl http:///metrics

If the endpoint is not accessible, you may need to adjust your Kubernetes service or ingress configuration.

Step 2: Update Prometheus Configuration

Edit the Prometheus configuration file to include the model's metrics endpoint. This file is typically named prometheus.yml and is located in the Prometheus server's configuration directory. Add a new scrape job for your Seldon Core model:

scrape_configs:
- job_name: 'seldon-model'
static_configs:
- targets: [':']

Replace <model-service-url> and <port> with the appropriate values for your deployment.

Step 3: Reload Prometheus Configuration

After updating the configuration, reload Prometheus to apply the changes. This can typically be done by sending a SIGHUP signal to the Prometheus process or using the Prometheus web interface to reload the configuration.

Additional Resources

By following these steps, you should be able to resolve the issue of metrics not being collected in Seldon Core. Ensure that your Prometheus configuration is up-to-date and that all endpoints are correctly exposed and accessible.

Master

Seldon Core

in Minutes — Grab the Ultimate Cheatsheet

(Perfect for DevOps & SREs)

Most-used commands
Real-world configs/examples
Handy troubleshooting shortcuts
Your email is safe with us. No spam, ever.

Thankyou for your submission

We have sent the cheatsheet on your email!
Oops! Something went wrong while submitting the form.

Seldon Core

Cheatsheet

(Perfect for DevOps & SREs)

Most-used commands
Your email is safe with us. No spam, ever.

Thankyou for your submission

We have sent the cheatsheet on your email!
Oops! Something went wrong while submitting the form.

MORE ISSUES

Made with ❤️ in Bangalore & San Francisco 🏢

Doctor Droid