Uptime SLA for your service

3 min readJul 9, 2021

Service Level Agreement of your solution

Now you can find out that your system consists of 100 vendor software on the vendor cloud services. (While in the past you put your own software on-premises services). And there are a lot of terms & figures in your contracts between you and vendors, which is SLA (Service Level Agreement)

If you don’t know what SLA is, go to Wiki. https://en.wikipedia.org/wiki/Service-level_agreement

Today, I would talk about how you control uptime SLA for your services (systems) and how to calculate it. All Cloud providers have provided uptime SLA for their services Incl. compute instances (VM, EC2, etc.), network instances (Router, Load Balancer, etc.).

Amazon Compute Service Level Agreement

This Amazon Compute Service Level Agreement (this "SLA") is a policy governing the use of the Included Services (listed…

aws.amazon.com

Compute Engine Service Level Agreement (SLA)

"type": "thumb-down", "id": "hardToUnderstand", "label":"Hard to understand" },{ "type": "thumb-down", "id"…

cloud.google.com

https://www.azure.cn/en-us/support/sla/virtual-machines/

But, they don’t guarantee the uptime SLA for your SERVICE. (Unless you are using SaaS from them). Therefore, you need to manage your SLA by yourself.

Before going for details, you need to know how to calculate SLA on linked systems. For instance, if you have router → load balancer → servers in series connection and their uptime SLAs are 99%, 99%, 99%. Then the SLA for your service (Excl. your application uptime) will be like 99% x 99% x 99% = 97.03%.

But if you have two routers (A and B) with parallel connections, your calculation will be like 1 — (A and B routers are down) = 1 — (1- P(A is up))(1-P(B is up). With the case above, the SLA will be like = (1 — (1–99.9%)(1–99.9%) = 1- 0.01% = 99.9999%

Let’s have some a bit more complex architecture.

Case: You have a router to route your input to two LBs (Load Balancer) and each LB attaches two EC2 (VMs).

Calculation of one side LB: 99.9% x (1-(1–99.9%)(1–99.9%)) = 99.89999% = 99.9%
Parallel two sites = 1-(1–99.9%)(1–99.9%) = 99.9999%
Entire uptime SLA = 99.89999% = 99.9%

Now you know how to calculate your service uptime with given SLAs by the cloud provider.

However, you also need to be careful, simple calculation of a combination of SLAs may not be your service uptime SLA. For example, you have a queue system and assume the queue system has 95% uptime SLA, but your database has 99.9999% SLA. In that case, our SLA is the result of calculating numbers? The answer is “Maybe No”. If your application is throwing the traffic to the queue system and the system does process the data ‘whenever’ in 5 minutes, and another system will get the data from the database. Then you don’t need to take account of the queue system SLA for your service SLA.

Usually, the vendor provides the SLA based on many factors to be considered (not only system but also operational factors) and historical data on their system currently up & running, while those SLA calculations will help you to establish the expectation and assumptions. (Don’t 100% trust the SLA numbers from the cloud providers as well)

Uptime SLA for your service

Amazon Compute Service Level Agreement

This Amazon Compute Service Level Agreement (this "SLA") is a policy governing the use of the Included Services (listed…

Compute Engine Service Level Agreement (SLA)

"type": "thumb-down", "id": "hardToUnderstand", "label":"Hard to understand" },{ "type": "thumb-down", "id"…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Park Sehun

No responses yet