For the second time in three weeks, Google Compute Engine (GCE) experienced service disruptions on March 7. For about 45 minutes, some customers experienced unusually slow response times or timeouts after failing to connect to their virtual machines.
Google said that both recent problems were related to egress traffic and blamed the latest incident on “packet loss on egress network traffic” that occurred after engineers made a configuration change on the cloud computing service. It said that engineers had successfully tested the change prior to rolling it out on production services but that for unknown reasons the production servers did not perform like the test environment. Google promised that ” future changes will not be applied to production until the test suite has been improved to demonstrate parity with behavior observed in production during this incident” and that it would roll out future changes more gradually.