SLA (Service Level Agreement)

What is SLA?

SLA, or Service Level Agreement is the agreement that any service provider makes with their users about measurable metrics like uptime and responsiveness.

Usually, for tech companies, it sets up expectations on what kind of service will be provided by the company, to its users.

For example, what is the error rate, latency, and downtime that is permissible from an SMS API? Mutually aligning on the limits creates clarity between all the parties and sets up expectations.

What happens when SLA is breached?

Once SLA is breached, the service provider must alert its team, communicate with the users, and find fixes or solutions to restore the services as soon as possible.

Are SLAs only external?

No. Service contracts are also commonly set up between teams within a company. This becomes critical as upstream and downstream services can significantly impact the user experience for an end API.

For example, if you’re the service owner of a platform search, your result is deeply dependent on the recommendation model that can be provided to the team. If the function call returning the response from model is built by another team, you’d ask them to keep it less than (say) 200ms so that the overall experience remains intact for the users.

You can also read about SLOs (Service Level Objectives) and SLIs here.

Also, here’s a good blog by Google’s Cloud team, explaining the difference between the three.

Backed By

Made with ❤️ in Bangalore & San Francisco 🏢