Image Source: Getty
Cyberspace got a rude shock on 19 July, 2024, with possibly the largest IT outage in history occurring thanks to an erroneous update by security software firm, “CrowdStrike,” which ended up affecting over 8.5 million devices running Windows Operating Systems. Though this constitutes a minuscule percentage of the Windows installed base (about 1 percent), the systems compromised were critical, including airlines and airports, public transit, healthcare, financial services, media and broadcasting, and even the delivery of uniforms for the Paris Olympic Games. While this might seem like a novel problem, it isn’t. Issues with automatic software updates have been festering for a while, and the CrowdStrike debacle just served as the tipping point. What has been brought to light is the pervasive presence of software supply chains in our day-to-day lives and the responsibility which software suppliers have been straddled with, given our growing dependence on them.
The reason behind the outage
A faulty automatic update to the widely used “Falcon” cybersecurity software developed by CrowdStrike assumes the responsibility for the outage. Falcon is a widely used platform employed by organisations around the world, which is why so many sectors were critically affected. The outage was not due to Microsoft directly, rather it was due to a third-party software vendor called CrowdStrike, which is responsible for Falcon and its corresponding update. It failed millions of Windows systems which ended up showing the so-called “Blue Screen of Death (BSOD).”
Falcon is a widely used platform employed by organisations around the world, which is why so many sectors were critically affected.
CrowdStrike released a sensor configuration update for the Falcon platform on 19 July. These configuration files are referred to as “Channel Files,” which are updated on a daily basis to counter novel tactics, techniques and procedures perceived by CrowdStrike to be cybersecurity threats. In this case, the impacted file in question was Channel File 291, whose update triggered a logic error eventually resulting in a system crash. Only Windows devices use this particular file, which is why Linux and Mac systems were not affected. The situation was further compounded by the fact that Microsoft’s own cloud platform Azure had experienced a widespread technical outage the previous night, though the two failures were seemingly unrelated.
Consequences of the outage and the aftermath
Several critical sectors and platforms were affected globally, largely because the systems had to be fixed manually, requiring several corrective steps, including rebooting. Airlines and airports faced the most severe disruption, with flights getting delayed or cancelled, leading to huge queues at airports around the world. Major US airlines like Delta, United and American Airlines were forced to ground flights and pause operations. Some of India’s busiest airports including Delhi, Mumbai, Chennai, and Bengaluru were significantly affected, with some airlines even being forced to issue hand-written boarding passes.
Healthcare systems and hospitals including those in the United Kingdom, Israel, and Germany faced disruptions in communicating with their patients. Television and news networks like Sky News were impacted in Australia as well as the UK. Online banking systems and financial systems like Visa were also affected. Several states in the US reported problems with emergency services using 911. Hackers were also quick to take advantage of the situation with phishing emails and fake phone calls posing as CrowdStrike support.
These configuration files are referred to as “Channel Files,” which are updated on a daily basis to counter novel tactics, techniques and procedures perceived by CrowdStrike to be cybersecurity threats.
While CrowdStrike was able to identify and deploy a fix within 79 minutes, physically fixing millions of systems individually is a time-consuming process. Though the CEO of CrowdStrike announced that about 99% of Windows sensors were back online by 29 July, 2024, in some cases, the problem is likely to take months to fix. Consequently, CrowdStrike shares plummeted by over US$ 20 billion post the outage, with shareholders and companies like Delta Airlines threatening to sue the company as well as Microsoft.
The problem with automatic updates in software supply chains
While worms and viruses like Stuxnet and NotPetya have led to outages in the past, this was the first instance when a software update was responsible for an outage on such a massive scale. Updates by renowned companies like Kaspersky and Windows’ own built-in antivirus, Windows Software Defender, have also led to BSOD crashes in the past, not to mention software updates by companies like Apple, which have had their own shortcomings. However, the unprecedented nature of the CrowdStrike outage just goes to show how integrated software supply chains have become in our day-to-day lives.
Updates by renowned companies like Kaspersky and Windows’ own built-in antivirus, Windows Software Defender, have also led to BSOD crashes in the past, not to mention software updates by companies like Apple, which have had their own shortcomings.
More important updates like “kernel driver updates” are vetted and tested by Microsoft. Configuration updates like the Channel 291 update are, however, left to third-party vendors like CrowdStrike, as in this particular instance.
Software updates have been plagued by performance and compatibility issues for a while now. While these did not cause massive outages like CrowdStrike, they have been lurking around for a long time. Windows 11 for instance, was marred by several compatibility and design issues right from the outset. Regardless, users were harangued to update from previous versions constantly until they finally gave in. With the world’s constantly increasing dependence on software supply chains for executing everyday tasks, more care and responsibility are required by software companies, particularly Big Tech corporations like Microsoft, Apple, and Google, which constitute a sizeable portion of these supply chains, to ensure that any software updates are properly tested before being deployed, which might otherwise lead to a cascading effect like it did with CrowdStrike and cripple major platforms across the globe. Constant updates are often required to counter new and evolving cyber threats. However, these should not come at the cost of compromising the cybersecurity systems themselves.
Prateek Tripathi is a Research Assistant at the Observer Research Foundation.
The views expressed above belong to the author(s). ORF research and analyses now available on Telegram! Click here to access our curated content — blogs, longforms and interviews.