Seldon Core

Understanding Seldon Core

Seldon Core is an open-source platform designed to deploy, manage, and scale machine learning models on Kubernetes. It provides a robust framework for serving models with features like logging, monitoring, and scaling. Seldon Core is particularly useful for organizations looking to integrate machine learning models into their production environments efficiently.

Identifying the Symptom: Model Server Scaling Issues

One common issue users encounter with Seldon Core is related to model server scaling. This problem manifests as either the model servers not scaling up when demand increases or not scaling down when demand decreases, leading to resource inefficiencies or service disruptions.

Observed Behavior

Users may notice that their model servers are not responding to increased traffic as expected, resulting in slower response times or even timeouts. Conversely, during low traffic periods, the servers may not scale down, leading to unnecessary resource consumption.

Exploring the Root Cause

The primary root cause of scaling issues in Seldon Core is often incorrect scaling policies or a lack of metrics that inform scaling decisions. Kubernetes relies on metrics to determine when to scale pods up or down. If these metrics are not correctly configured or available, the scaling behavior will not function as intended.

Scaling Policies

Scaling policies define the conditions under which the model servers should scale. These policies need to be accurately set to reflect the desired scaling behavior based on traffic patterns and resource utilization.

Steps to Resolve Scaling Issues

To address scaling issues in Seldon Core, follow these steps:

1. Review and Correct Scaling Policies

Access your Kubernetes cluster and review the Horizontal Pod Autoscaler (HPA) configurations for your Seldon deployments.

Ensure that the target metrics (e.g., CPU utilization, custom metrics) are correctly defined. For more information on setting up HPA, refer to the Kubernetes HPA documentation.

Adjust the minReplicas and maxReplicas settings to align with your expected traffic patterns.

2. Ensure Metrics Availability

Verify that the metrics server is running in your Kubernetes cluster. You can check this by running the command: kubectl get pods -n kube-system | grep metrics-server.

If the metrics server is not running, deploy it using the instructions from the Metrics Server GitHub repository.

Ensure that your Seldon deployments are configured to expose the necessary metrics. This may involve setting up Prometheus or another monitoring solution.

3. Monitor and Test Scaling Behavior

After making changes, monitor the scaling behavior of your model servers. Use tools like Grafana to visualize metrics and ensure that scaling is occurring as expected.

Seldon Core Model server scaling issues

Seldon Core Model server scaling issues

Understanding Seldon Core

Identifying the Symptom: Model Server Scaling Issues

Observed Behavior

Exploring the Root Cause

Scaling Policies

Steps to Resolve Scaling Issues

1. Review and Correct Scaling Policies

2. Ensure Metrics Availability

3. Monitor and Test Scaling Behavior

Conclusion

Master

debugging in Minutes

— Grab the Ultimate Cheatsheet

Thank you for your submission

Seldon Core

Cheatsheet

Thank you for your submission

MORE ISSUES

Backed by

Resources

Contact

Platform

Connect

Doctor Droid