Every tech company providing a service, whether it be free or paid, shares one similar objective: Deliver the best possible experience in order to attract and retain users. After all, without the users there is no reason (or money for that matter) for the service to exist.
When using a service, you want to be able to trust it will perform as promised. If Google suddenly became notorious for outages and slowdowns we’d likely see a mass exodus of users looking for a new search engine. Yet because of Google’s ability to consistently meet user expectations and deliver (at least) 99.99% uptime month-after-month, the search engine giant continues to dominate with over 70,000 searches every second.
Maintaining these high uptime percentages isn’t just something Google “shoots for” every month because it looks good. Their Monthly Uptime Percentage is a key indicator that is measured in order to determine whether or not they’re delivering on the promises made to their users – in this case, a search engine that works as planned 99.99% of the time. Not bad, Google, not bad at all.
These different promises or agreements that tech companies make with their customers are often defined within a Service Level Agreement (SLA). These SLAs consist of different Service Level Objectives (SLO) that are tracked and monitored by measuring specific Service Level Indicators (SLI).
Companies define, track, and monitor these SLAs, SLOs and SLIs with the goal of creating a more reliable service for their customers. But what exactly do these terms mean and how do they relate to one another?
Service level management is crucial for any organization that provides services to customers. Defining and tracking key performance metrics helps ensure you are meeting customer expectations and business objectives. The three main metrics used in service level management are SLAs SLOs, and SLIs. While related, these metrics serve different purposes. Understanding the differences between SLAs, SLOs and SLIs is key for effective service level management.
What is an SLA?
SLA stands for Service Level Agreement. An SLA represents a formal agreement between a service provider and the customer/user that defines the level of service expected. It outlines the service performance metrics, responsibilities of both parties, and remedies or penalties if agreed service levels are not achieved.
Some key things to know about SLAs
-
They are agreements between a service provider and external customers/users. Internal systems usually do not have SLAs.
-
They focus on defining service levels from the customer’s point of view, such as system uptime, availability, responsiveness, etc.
-
They specify measurable metrics and targets for service performance. Common metrics include uptime percentage, mean time to repair, response times, etc.
-
They outline consequences for failing to meet service levels, such as service credits or penalties.
-
They are legally binding documents that hold the service provider accountable for service quality.
SLAs ultimately aim to provide transparency on service expectations and ensure customers receive adequate service levels.
What is an SLO?
SLO stands for Service Level Objective. SLOs define the performance targets that a service aims to meet in order to deliver the desired customer experience.
Some key characteristics of SLOs:
-
They are specific measurable goals for service performance.
-
They focus on customer-centric metrics like uptime, latency, error rates, etc.
-
They help align customer expectations with internal service targets. The SLO target may differ from what is promised to the customer via SLA.
-
They guide the teams responsible for service delivery on the performance levels needed.
-
They provide a benchmark for evaluating service reliability and determining necessary improvements.
While SLAs represent a commitment to customers, SLOs represent internal objectives for service teams. Setting appropriate SLOs is important for maintaining reliability.
What is an SLI?
SLI stands for Service Level Indicator. SLIs are the metrics used to measure and report on service performance for SLOs.
Some tips for effective SLIs:
-
Choose simple, quantifiable metrics that provide meaningful signal on service levels.
-
Focus on indicators tied directly to key SLOs. Not all metrics need to be SLIs.
-
Automate SLI collection and reporting as much as possible.
-
Establish baselines and thresholds for identifying performance issues.
-
Monitor SLIs on an ongoing basis and review trends periodically.
SLIs provide the data needed to evaluate SLO achievement, identify problems, and demonstrate service levels to customers.
Best Practices for Setting SLAs, SLOs, and SLIs
Here are some best practices to keep in mind:
For SLAs:
-
Involve technical teams in drafting to ensure achievable service levels.
-
Use clear, simple language that sets unambiguous expectations.
-
Focus on the most critical, user-facing service metrics.
-
Build in room for some failure with error budgets.
-
Establish a process for changing SLAs when needed.
For SLOs:
-
Limit SLOs to the smallest number of critical metrics.
-
Set objectives based on historic performance and capacity limitations.
-
Review periodically and adjust targets when service changes.
-
Support SLOs with clear SLI metrics and monitoring.
-
Use error budgets to allow for flexibility and experimentation.
For SLIs:
-
Identify indicators that directly track key SLOs.
-
Automate SLI collection and reporting as much as possible.
-
Document SLI calculations and collection processes for transparency.
-
Set thresholds for early identification of performance issues.
-
Review SLIs regularly to verify they are meaningful and accurate.
SLA, SLO, and SLI Relationships and Interactions
SLAs, SLOs, and SLIs are all integral components of service level management that work together:
-
SLAs establish the service commitments to customers based on their needs.
-
SLOs set internal targets needed to deliver the SLAs.
-
SLIs measure performance against SLOs.
The SLO targets account for operational realities and may differ from the SLAs. For example, an application SLA could promise 99.95% uptime per month. The internal SLO target to actually meet this SLA commitment might be 99.99% uptime.
Monitoring SLIs lets you determine SLO achievement. If SLIs show the SLO is not being met, action can be taken to improve service reliability or adjust targets as needed.
Ongoing SLI monitoring coupled with periodic SLA and SLO reviews help ensure alignment between customer needs, business priorities, and service operations.
Real-World Examples of SLAs, SLOs, and SLIs
To make SLAs, SLOs, and SLIs more concrete, here are some examples:
SLA example
Our cloud storage service will provide 99.9% monthly uptime, excluding scheduled maintenance windows. Failure to meet this SLA will result in a 10% credit on the monthly service fees.
Corresponding SLO
The cloud storage system will aim to maintain 99.95% uptime per month.
Associated SLI
Monthly uptime percentage calculated as:
(Total Minutes in Month – Total Minutes of Outages) / Total Minutes in Month x 100
This SLI would be tracked in monitoring and compared to the 99.95% uptime SLO. If uptime drops below the SLO, the service team can quickly investigate and resolve.
Key Takeaways
-
SLAs represent commitments to customers, SLOs represent internal service level targets, and SLIs provide performance data.
-
SLAs focus on customer needs, while SLOs consider operational capabilities.
-
SLIs should tie directly to SLOs. Focus on a small set of meaningful SLIs.
-
Monitor SLIs continually and review SLOs and SLAs periodically to ensure proper alignment.
With well-defined SLAs, SLOs, and SLIs in place, organizations can effectively manage service levels and meet customer expectations. Aligning these metrics and monitoring them provides the visibility, accountability, and impetus for delivering great service.
Operational Integrity at FOX
See how this media companys modern approach to digital operations helped reduce costs, accelerate innovation, and drive revenue growth.
Work how you want with PagerDuty.
Were empowering teams with the time and efficiency to build the future.
See how we are building resilience and accelerating change.
Learn about our efforts around gender pay equity, employee engagement, ESG, and more.
SLA vs SLO vs SLI: Whats the Difference?
What is the difference between SLO and SLI?
SLO (Service Level Objective) is an objective that the service provider focuses on to meet the SLA. SLOs are simply just different points stated in the SLA. A 99.99% uptime is an SLO; the 24-hour support response time is another SLO. SLI (Service Level Indicator) is the real number showing the actual fulfillment of a given SLO.
What is the difference between SLA vs SLI?
At times, distinguishing between SLA vs SLO, or SLO vs SLI can be confusing. However, here’s a simple breakdown of the differences between SLA, SLO, and SLI: SLA: An agreement between service provider and customer SLO: Objectives set by the the organization based on SLI
What is the difference between SLA and Slo?
An SLA is a formal agreement set by a service provider for the performance or quality of a service. On the other hand, SLOs are clear targets that you as the provider set internally to evaluate if the SLAs are being met. For example, the following is part of an SLO provided by AWS for its individual EC2 instance.