Operations Shut Down: Data and Software Recovery During COVID-19April 17, 2020 - by Matt Scott
This is the second blog of a four-part series on the operational effects posed to systems and equipment during business shut down periods, and the dramatic or catastrophic issues that can occur when facilities return to normal business. Read other published blogs in the series including part one, part three and part four.
As a result of the current COVID-19 pandemic, businesses around the globe have been forced to drastically adjust or halt their normal business operations. For businesses that rely on critical data systems and software processing devices, the ramifications of an extended downtime can be catastrophic. Whether it be a machine shop or manufacturing operation with complex controls and SCADA systems, or a scientific laboratory with specialized business-critical equipment, computer server environments and other integral computing assets must be operational for businesses to recover after any amount of downtime.
Right now, across the world, thousands of businesses are closed. As a result, the data processing equipment that supports business operations has been temporarily abandoned or simply turned off. In the coming weeks and months when clearance is granted for personnel to return to work and restart these stagnant devices, the operability of systems will largely be impacted by a lack of preparation prior to the time businesses closed, or improperly administered equipment restart procedures. All of these factors will inevitably lead to an increase in the system operating incorrectly in the coming months.
In this article we will explore the four most common data processing equipment and system failures that insurance claim professionals and business owners are likely to experience following this unexpected and extended period of interrupted business operations.
Time to Prepare: 4 Common Equipment and System Failures
1. Improper Shutdown: Regardless of whether you are dealing with machine controls, network servers or laboratory equipment, proper shutdown is a necessity in order to support a successful startup. In most cases, the proper shutdown of data processing devices requires more than simply turning the device off; Envista recommends the end user follow manufacturer shutdown and startup guidelines. These shutdown and startup guidelines enable the data processing device to write or save all of the data parameters that have not been written to memory. This includes the proper saving of program instructions, machine programming parameters and database tables, production results and other vital instructional variables that will be necessary for the system to startup and to continue exactly where it left off. Ignoring these mandatory processes will result in delays with getting the data processing systems back into operation, assuming that systems can be regained as they were.
2. Powering Down Equipment or “Pulling the Plug:” Similar to the improper shutdown issues outlined above, Envista anticipates that many companies may have simply powered down their systems as a way to shut down data processing devices; or the companies left them operating unattended and there was an abrupt shutdown during that period. This is not recommended and prevents critical data from being properly written and saved. The operation of simply “pulling the plug” can increase the chance of causing corrupted data within operating systems, such as Microsoft Windows, and might contribute to the loss of critical business operations databases, such as those used for financial reporting, inventory management or enterprise resource planning (ERP). With this data missing or corrupted, data processing devices cannot successfully boot and data applications cannot function. Depending on the impacts of improperly shutting down or powering off equipment, professional data recovery may be necessary in order to restore and rebuild the databases, if this process is able to regain operation at all.
3. Wear and Tear: In some situations, businesses may not have had the time to properly shut down their equipment, and systems may have been left operating unattended for weeks and months. The lack of active monitoring may result in equipment or component wear and tear, which will inhibit the restoration of computer servers or manufacturing/automation operations. To understand this more, consider that both a computer server or workstation and a Programmable Logical Controller (PLC) for machine automation rely on a battery to keep memory powered and controlled. This battery is almost always overlooked and is necessary for computer, server or workstation startup parameters to initiate. In addition, machine automation devices, such as the PLC mentioned, will use a similar battery configuration to keep memory active for storing the customized programming.
In each of these described devices, the necessity of that battery to keep the memory active is essential if the device is to be restored when power is again applied for the restart. If the battery does not provide sufficient power to maintain the memory of the device, the computer or PLC will not be operational, which can cause delays in, or failures with, returning to intended operations. Using manufacture guidelines to both identify this battery and confirm the operation, is a simple task to confirm the battery is capable of, or sufficient enough, to keep the memory powered and active for the length of time the equipment is powered off.
It should be noted that if systems are left operating, but not actively monitored, wear and tear effects will accrue if not addressed by personnel. The impacts of wear and tear may be benign individually but can accumulate and cause possible outages or catastrophic failures. Carefully reviewing operational logs may be needed to determine the sequence of events.
4. Continued Monitoring of System Updates: Systems that are left powered on may require critical system updates in order to remain operational. Without staff on-site to supervise the proper installation of system updates, we anticipate data processing device failures to be prevalent. Typical system updates require a computer or server to be rebooted. This reboot may interrupt application services that are being performed and cause applications to fail. In some other cases, updates may directly conflict with the continued operation of the application or the server itself, therefore causing the system to no longer operate. In these cases, the ability to recover the application or service may be difficult. In all cases, there is simply no method to restore system configurations as they were before, again causing delay in restoring the operation and requiring possible data recovery if extensive efforts must be expended.
As businesses begin to restore their operations in the coming weeks and months, it is crucial that all stakeholders--from risk managers, to insurers, to service providers--to understand the common causes of data processing equipment and system failures and make the necessary decisions to mitigate delays and return critical operations into service.
About the Author
Matt, Practice Leader, Digital Forensics, has nearly 20 years of experience, and provides consulting expertise to the insurance, legal, law enforcement, private, and public communities on computer/mobile forensics, cyber incidents and failure analysis. He investigates computer-related crimes, cyber incidents (breach investigations) and/or ransomware. He also has vast experience determining origin and cause of failures, and is highly proficient in multiple programming languages.