Over-Temperature Losses in Data Centers
The amount of data that people and businesses create and use on a daily basis is growing at an exponential pace ever since the invention of the computer. The amount of data that businesses need is outgrowing what many businesses can manage on their own property. Consequently, most businesses have decided to move their data to the “cloud.”
You may have wondered, what is the cloud? Data is not simply floating in the air (although we can use it like it is), it must be stored somewhere. The solution to the increasing issue of data storage is data centres. There are over 2,700 data centres in the US alone and thousands more across the globe. Although, they are not easy to find and not talked about much.
Rising Temperatures in Data Centres
Data centres are dedicated facilities used to host IT equipment for many different industries and for various functions. Software-based servers, known as virtual servers or Virtual Machines (VM), are utilized by businesses to increase the productivity of server equipment but can add more complex interactions within the data centre.
A VM is a software-based server that can run its own Operating System (OS) and applications as if it were a physical computer. A VM behaves exactly like a physical server and utilizes a portion of the physical server’s resources. These resources are the Central Processing Unit (CPU), Random Access Memory (RAM), Hard Disk Drive (HDD) storage capacity, and Network Interface Controller (NIC). In a VM, the OS cannot tell the difference between a VM and a physical machine, nor can applications or other computers on a network. Several VMs are capable of being installed onto one physical server to provide consolidated operation, minimize support, and lower hardware costs. The following graphic presents the configuration of a VM environment from a host server.
As technology advances, rack density continues to increase which often leads to temperature rises. Heat is a bane of data centres. As the ambient temperature rises, a loss of productivity, data, and even equipment can be sustained.
Data centres combat the heat put off by the equipment by cooling the racks with Computer Room Air Conditioning (CRAC) units, Heating Ventilation and Air Conditioning (HVAC) systems, liquid cooling, and sometimes supplemental cooling such as In-Row Cooling (IRC) technology. These cooling systems help maintain and monitor a consistent ambient temperature within the data centre. Additionally, environmental monitoring (i.e., temperature, humidity, and moisture) can be done via multiple methods such as real-time Universal Serial Bus (USB) alert systems, room alerts, sensors etc.
Equipment Loss Consulting Evaluations
Heat can cause issues to the hosted IT equipment within the data centre and trigger a fire suppression system to deploy if the ambient temperature exceeds certain thresholds, in turn causing possible damage to nearby technology. The resulting downtime to the IT equipment due to a loss is much more difficult to discern what needs to occur in order to mitigate the over-temperature event.
Common issues following an over-temperature event include battery and power supply failures, as these items do not dissipate heat. Hard Disk Drives can also have problems following this type of event depending on the temperature and duration. Many events start with the request to replace all items as they are deemed untrustworthy. Equipment loss consultants can assess the event by completing an inventory analysis, providing details of the event including the temperature reached and duration, reviewing quotes and invoices, and working with multiple stakeholders and vendors to restore the equipment, if possible, to a pre-loss condition. This can include a combination of repairs, replacement, and restoring warranties with manufacturers or even replacing the warranty with a third-party support vendor.
At Envista, our Equipment Loss Consulting (ELC) team can identify how and why cooling systems have failed, which can be due to many reasons, from power outages or poorly maintained equipment to heat waves that occur in areas that cause a rise in external temperatures with a lack of adequate cooling to account for the increased outside temperatures.
What To Do During Over-Temperature Events
Some IT equipment, such as servers, are designed to protect themselves from an over-temperature event by automatically powering off. The equipment moves from operating temperature to storage temperatures which have a different tolerance thresholds. Other equipment, such as networking devices and storage arrays, are not designed to protect themselves and therefore do not power off as they are core critical.
It is common following the discovery of an over-temperature event to perform an Emergency Power Off (EPO), an abrupt power off to the IT equipment which can cause its own set of problems including data, software, and configuration corruption or even data synchronization issues between equipment. The best business practice is not to panic, begin cooling the data centre, and perform a graceful shutdown per defined procedures for each device so as not to cause further problems to the hosted equipment.
Data centre losses are complex with multiple parties and vendors involved as well as many different pieces of equipment. Due to the event, it is common to have the request to replace all of the hosted equipment. Envista can provide what occurred, the method of repairs and certainty on what mitigation is necessary.
Case Study: Over-Temperature to Data Centre Equipment
Read our case study on a primary chiller that stopped running to allow the other to take over as scheduled, the secondary chiller did not come on as it was supposed to and cool the space. Following this over-temperature event, where temperatures were more than the allowable range, the bank claimed over $4.5M for a full replacement of the affected data centre equipment.
Our experts are ready to help.