Cloud giant AWS has experienced a major breakdown in its Frankfurt availability zone. The outage in Amazon Web Services’ EU-Central region (EUC_AZ-1) was caused by a failure in the air circulation system and took approximately 3 hours to fix. AWS stated that the staff couldn’t enter the data hall because the fire suppression system removed oxygen from the air.
Caused by the air circulation system
AWS stated that they noticed unusual high error rates in the zone. The cloud giant also stated a failure caused air handlers to stop working and the air temperature rose. According to the statement, the staff could have fixed the problem earlier but the fire suppression system was automatically activated, which only should be active only when it detects smoke.
The suppression system removed the oxygen from the air in the data hall, thus staff couldn’t enter the facility for a longer period of time. After the fire department said the site is safe to enter cooling system was activated again by the staff and the servers are turned on. AWS stated,
“Servers and networking equipment in the affected Availability Zone began to power off when unsafe temperatures were reached. A larger number of EC2 instances in this single Availability Zone lost network connectivity. While our operators would normally have been able to restore cooling before impact, a fire suppression system activated inside a section of the affected Availability Zone.
In order to recover the impacted instances and network equipment, we needed to wait until the fire department was able to inspect the facility. After the fire department determined that there was no fire in the data center and it was safe to return, the building needed to be re-oxygenated before it was safe for engineers to enter the facility and restore the affected networking gear and servers.”