DrDroid

Seldon Core Metrics not being collected

Prometheus not scraping the model metrics endpoint.

👤

Stuck? Let AI directly find root cause

AI that integrates with your stack & debugs automatically | Runs locally and privately

Download Now

What is Seldon Core Metrics not being collected

Understanding Seldon Core

Seldon Core is an open-source platform designed to deploy machine learning models on Kubernetes. It provides a robust framework for scaling, monitoring, and managing machine learning models in production environments. One of its key features is the ability to integrate with monitoring tools like Prometheus to collect metrics from deployed models.

Identifying the Symptom

When using Seldon Core, you might encounter a situation where metrics are not being collected as expected. This can manifest as missing data in your monitoring dashboards or alerts not being triggered due to the absence of metrics.

Common Observations

Empty or incomplete metrics data in Prometheus. Grafana dashboards showing no data or gaps in data. Alerts based on metrics not firing.

Exploring the Issue

The root cause of metrics not being collected is often due to Prometheus not scraping the model metrics endpoint. This can happen if the endpoint is not correctly exposed or if Prometheus is not configured to scrape it.

Why This Happens

Prometheus relies on a configuration file to determine which endpoints to scrape for metrics. If the Seldon Core model's metrics endpoint is not included in this configuration, or if there are network issues preventing access, metrics will not be collected.

Steps to Fix the Issue

To resolve this issue, follow these steps to ensure that Prometheus is correctly configured to scrape the Seldon Core model metrics endpoint.

Step 1: Verify Endpoint Exposure

Ensure that the model's metrics endpoint is exposed. You can check this by accessing the endpoint directly in your browser or using a tool like curl:

curl http:///metrics

If the endpoint is not accessible, you may need to adjust your Kubernetes service or ingress configuration.

Step 2: Update Prometheus Configuration

Edit the Prometheus configuration file to include the model's metrics endpoint. This file is typically named prometheus.yml and is located in the Prometheus server's configuration directory. Add a new scrape job for your Seldon Core model:

scrape_configs: - job_name: 'seldon-model' static_configs: - targets: [':']

Replace <model-service-url> and <port> with the appropriate values for your deployment.

Step 3: Reload Prometheus Configuration

After updating the configuration, reload Prometheus to apply the changes. This can typically be done by sending a SIGHUP signal to the Prometheus process or using the Prometheus web interface to reload the configuration.

Additional Resources

Prometheus Configuration Documentation Seldon Core Documentation Kubernetes Services

By following these steps, you should be able to resolve the issue of metrics not being collected in Seldon Core. Ensure that your Prometheus configuration is up-to-date and that all endpoints are correctly exposed and accessible.

Seldon Core Metrics not being collected

TensorFlow

  • 80+ monitoring tool integrations
  • Long term memory about your stack
  • Locally run Mac App available
Read more

Time to stop copy pasting your errors onto Google!