devxlogo

CrowdStrike software update causes global disruptions

CrowdStrike Disruptions
CrowdStrike Disruptions

A major internet outage affecting Microsoft disrupted flights at airlines and operations at businesses and government offices worldwide on Friday. The outage highlighted the fragility of a digitized world dependent on just a handful of providers. Air travelers bore the brunt of the disruption, with passengers posting pictures on social media of crowds stranded at airports in Europe and the United States.

In the U.S., American Airlines, Delta Air Lines, United Airlines, Spirit Airlines, and Allegiant Air experienced grounded flights for varying durations. Airlines reported that the outage affected multiple systems, including those used for passenger check-in and aircraft weight calculations — critical for takeoffs. United Airlines and some others issued waivers to allow customers to change their travel plans.

Ongoing problems were visible at airports like Chicago’s O’Hare, where information screens were stuck on blue Windows recovery screens. The health care sector also faced significant challenges. In the U.S., Harris Health System in Houston suspended hospital visits, and elective procedures were canceled.

The New York-based Memorial Sloan Kettering Cancer Center paused any procedures requiring anesthesia, while Massachusetts’ Mass General Brigham canceled all scheduled nonurgent surgeries and medical visits for Friday. However, emergency departments remained open. Canada’s University Health Network and Britain’s National Health Service (NHS) also reported delays, with the NHS experiencing problems at most doctor’s offices across England.

See also  Trump Says US Oil Eyes Venezuela Investment

The critical 999 emergency number remained unaffected. People seeking entry to the U.S. from both its northern and southern borders experienced delays. The San Ysidro Port of Entry saw pedestrians waiting three hours to cross, while vehicles in the Trusted Traveler program faced 90-minute waits.

The San Diego Metropolitan Transit System reported that some employees living in Tijuana, Mexico, were unable to get to work, potentially affecting local transit services. Long delays were also reported at the U.S.-Canada border at crossings such as the Ambassador Bridge and the Detroit-Windsor tunnel. The outage also extended to retail businesses like Starbucks, where customers found themselves unable to order ahead online or via mobile apps.

The coffee chain apologized for the inconvenience and continued serving customers in most of its stores and drive-thrus. The widespread technology outage was traced back to a faulty software update impacting Microsoft Windows computers. A fix is on the way, but disruptions were felt across multiple sectors.

This recent outage underlines the interconnected nature of modern infrastructure. Dependence on a few major technology providers can lead to widespread issues affecting everyday activities across the globe. On July 19, 2024, CrowdStrike, a cybersecurity firm, released a routine content configuration update to its Falcon sensor for Windows hosts.

This update aimed to enhance telemetry gathering on novel threat techniques. Unfortunately, an error in the update caused Windows systems running sensor version 7.11 and above to crash, known as the Blue Screen of Death (BSOD). This issue was promptly identified and resolved within two hours of its release.

At 04:09 UTC on July 19, 2024, CrowdStrike rolled out the update as part of its dynamic protection mechanisms. However, this update led to system crashes for devices that were online and received the update within the short window up to 05:27 UTC.

See also  Army Tests Electromagnetic Defense Against Swarms

crowdstrike update causes widespread disruptions

Devices that came online after this period or were offline were not affected. Mac and Linux systems were also unaffected. CrowdStrike’s Falcon platform uses two types of content updates:

1.

Sensor Content: Shipped with the sensor and includes AI, machine learning models, and code for long-term capabilities. 2. Rapid Response Content: Dynamically updated from the cloud to quickly respond to new threats.

The problem originated with the Rapid Response Content update containing a defect that was not detected during validation. Specifically, a bug in the Content Validator allowed problematic Template Instances to pass through, leading to a critical memory read error when processed by the sensor. CrowdStrike identified and reverted the problematic content within a short period, thereby halting additional system crashes:

1.

Software Resiliency and Testing:
– Enhanced local developer testing and automatic rollback of problematic updates. – Implemented stress testing, fuzzing, and fault injection to uncover similar issues in the future. – Introduced additional validation checks and refined error handling mechanisms.

2. Rapid Response Content Deployment:
– Improved the Content Validator to prevent such errors moving forward. – Strengthened the robustness of the Content Interpreter to better manage exceptions.

CrowdStrike’s swift response and remediation underscore the importance of meticulous update processes in cybersecurity infrastructure. The company will continue to enhance its testing protocols to prevent future occurrences. CrowdStrike will release a comprehensive Root Cause Analysis later, detailing the incident and steps taken to fortify their systems against similar vulnerabilities.

Microsoft is advocating for significant changes in Windows security architecture following a major outage caused by a CrowdStrike update. The incident, which left 8.5 million PCs offline due to a problematic update, has prompted Microsoft to rethink its approach to security vendors accessing the Windows kernel. CrowdStrike’s software operates at the kernel level of the operating system, which gives it unrestricted access to system memory and hardware.

See also  Musk Clashes With Ryanair Over Starlink

While this allows for robust threat detection, it also means that any issues with CrowdStrike’s app can lead to severe consequences, such as the Blue Screen of Death experienced by users recently. John Cable, Microsoft’s Vice President of Program Management for Windows Servicing and Delivery, highlighted the necessity for changes. “This incident shows clearly that Windows must prioritize change and innovation in the area of end-to-end resilience,” Cable stated.

He calls for closer cooperation between Microsoft and its partners to enhance the security of the Windows ecosystem. While specific improvements in response to the CrowdStrike issue were not detailed, Cable mentioned a new feature designed not to require kernel mode drivers for tamper resistance as an example of recent security innovations. “These examples use modern Zero Trust approaches and show what can be done to encourage development practices that do not rely on kernel access,” Cable added.

Microsoft’s stance might prompt a broader discussion on Windows kernel access, similar to Apple’s 2020 decision to restrict developers’ kernel access on macOS. However, any movement in this direction will need to consider the implications for security vendors deeply embedded in the Windows environment. The broader security community and Microsoft will continue to work collaboratively to improve the resiliency of the Windows platform, aiming to make it more secure and reliable for all users.

Cameron is a highly regarded contributor in the rapidly evolving fields of artificial intelligence (AI) and machine learning. His articles delve into the theoretical underpinnings of AI, the practical applications of machine learning across industries, ethical considerations of autonomous systems, and the societal impacts of these disruptive technologies.

About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.