CrowsStrike's rushed update is intended to prevent cyber-attacks from accidentally causing problems for millions of Windows machines around the world.
"We estimate that CrowdStrike's update affects 8.5 million Windows devices, equivalent to less than one percent of the total number of Windows devices globally," Microsoft announced on its blog early on the morning of July 21. "While small in scale, the broad economic and social impact is reflecting the reliance on CrowdStrike by businesses that operate many critical services."
CrowdStrike is one of the industry's largest cybersecurity companies. On July 19, an update to the company's Falcon Sensor software, used to protect Microsoft Azure cloud infrastructure and Windows computers, caused thousands of flight delays, television stations to stop broadcasting and users cannot access services such as healthcare or banking.
Microsoft said CrowdStrike has developed a solution to help Azure infrastructure quickly recover, and cooperated with Amazon Web Services and Google Cloud Platform to share information about the impacts Microsoft is experiencing.
The aviation industry is gradually recovering after the incident. Reuters quoted Delta Air Lines, one of the most heavily affected airlines, saying that as of 9:00 p.m. on July 20 (Hanoi time), they had canceled more than 600 flights and expected to have to cancel more. In Vietnam, Vietjet on the afternoon of July 19 also announced that it would be affected due to flight delays at other airports around the world.
The Falcon Sensor update caused a "blue screen of death" issue that is considered one of the largest scale connection disruptions in recent years.
Falcon Sensor is said to make the system more secure by continuously adding about new threats. However, the new version contains error codes that were not detected before widespread release. Reuters quoted Patrick Wardle, an expert in researching threats to operating systems, as saying that the error code "is in a file containing configuration information or signature". Signatures often contain information that helps security software detect types of malicious code or malware.
"With the desire to ensure customers are protected from the latest threats, security products typically update signatures as often as maybe once a day," Wardle said, crediting CrowdStrike with this. Not tested thoroughly before release.
Software updates are like untested drugs. Lines of code therein can contain errors and even conflict with other software, just like drugs can have unwanted side effects.
Typically, companies will spend time testing, testing, and deploying software to a small group of people first, before releasing it widely to all. However, in the field of security, service providers have to race against threats, or compete between companies, leading to a trade-off between security and stability.
"Antivirus products have to push through multiple updates every day, due to the need to respond as quickly to threats as possible. So having to check multiple times a day becomes burdensome," Paul Davis, Director chief information security officer (CISO) of JFrog platform, told Fortune.
According to him, security companies often test the basic functionality of software, but still need to rely on automatic updates, accepting "calculated risks".
Meanwhile, CrowdStrike this time did not calculate those risks in advance. The update created a blue screen of death, affecting a series of critical services. Even though the error has been identified and resolved, the fix may have to be done manually and take a lot of time.
"You would have to go to each computer, reboot, and when the screen comes up, you have to press F3 to enter Safe Mode and then delete a file somewhere," said the CISO of a company that was a victim of the problem, explain.
To prevent a similar disaster, John Hammond, security researcher at Huntress Labs, recommends the procedure is to thoroughly test the update or roll it out to a limited group before applying it broadly.
Before CrowdStrike, a similar incident from a buggy McAfee anti-virus software update also caused hundreds of thousands of computers to stop working in 2010.
