Availability is a term frequently heard by those who are interested in technology today. Its importance is frequently mentioned by design, network, and system experts. Briefly, availability represents the state of a system, application, or service that is working with minimal inaccessibility.
It is very important to calculate the availability period on the systems and to undertake these periods by the service providers to their customers. By offering Service Level Agreements (SLA), telecommunications and internet service providers promise how many interruptions may occur in their services at most.
What are the parameters for availability?
We will explain in a moment how availability is calculated. Now let’s explain some parameters for a better understanding of availability.
Uptime: Uptime refers to the total time a system, application, or dataset is accessible. It is usually expressed as a percentage of run time. Example: 99.7% uptime.
Downtime: Represents the total time the system is inaccessible. It is measured in seconds, minutes, and hours.
Availability: Represents the time when a system, application, or dataset is available within a specified time.
MTTF (Mean time to failure): It represents the total downtime of the current system.
MTTR (Mean time to repair): It represents the average time required to repair the malfunction that occurred.
MTBF (Mean time between failures): The estimated time between failures of an IT system during operation.
RTO (Recovery Time Objective): The amount of time a system, application, or dataset is made available again after an outage.
Calculating system availability
Calculating the availability period of a system, application or dataset is quite simple. It is calculated by adding the total operating time to the downtime and dividing it by the operating time. So let’s calculate how much time a service running 720 hours has in 10 hours of interruption as an example:
- Availability = Uptime ÷ (Uptime + downtime)
- Availability = 720 ÷ (720 + 10)
- Availability = 720 ÷ 730
- Availability = 0,986
- Availability = 98,6%
Even though the calculation of system availability is pretty simple, you may still want to use an online availability calculator like in the links below:
What is high availability?
High availability is a technology and design principle created to minimize the downtime of systems and to ensure business continuity. It aims to keep systems in working condition by supporting them with additional hardware or software, thus protecting employees and customers against unexpected interruptions.
High availability technologies
Clustering, NIC Teaming, and RAID-style technologies support system parts to back up each other, providing continuous, uninterrupted operations. High availability can be used in many areas, but this is very costly for some businesses.
Some of the components that you can plan as high availability in your infrastructure are as follows;
The biggest goal of the high availability topic is to eliminate a single point of failure (SPOF). Let’s take a look at what exactly SPOF is.
What is a single point of failure (SPOF)?
SPOF, briefly, is the interruption of the whole system because of a result of the failure of the critical part of the installed system. In the example below, the failure of the storage device in a cluster structure causes the servers to become unusable.
High availability works in two ways and that should be planned completely according to the needs of your organization:
- Active-Active: Active-active working status of both devices configured as HA.
- Active-Standby: It is a structure that works when one of the two devices configured as HA is in standby mode and the services on it are switched to the standby server if the active device causes any problem.
What is load balancing?
Load Balancing is the distribution of incoming traffic to a system or network over multiple servers and preventing the bottlenecks that may occur. This prevents long response times for the servers. More details: What is Load Balancer?
What is failover?
Failover is a system principle that ensures the services are transferred to backup hardware registered with the system seamlessly in case the active hardware/server becomes unresponsive; enabling an uninterrupted service.
What is a failover cluster?
A failover cluster is an environment that transfers responsibility to multiple devices in case of an interruption in the service. If your primary hardware/server does not respond to requests, it assigns responsibility to other machines in the cluster and ensures uninterrupted service.