On Monday, the Google Compute Engine (GCE) cloud computing service suffered an 18-minute outage. While the downtime was short in duration, the event was significant because it affected all GCE regions worldwide, meaning that data centers were not able to failover from one Google region to another.
Google has now published a detailed explanation for the incident, which it says it is taking particularly seriously. The problem arose from two bugs in its network management software, which knocked out applications running on GCE, as well as dependent VPNs and L3 network load balancers.
Google said it is working to make sure the problem doesn’t happen again, and it is offering affected customers credits equal to 10 percent of monthly GCE charges and 25 percent of VPN charges.