Fastly says single customer triggered bug behind mass internet outage | Internet
An internet blackout that knocked out some of the world’s biggest websites on Tuesday was ultimately caused by a single customer updating their settings, the infrastructure provider Fastly has revealed.
A bug in Fastly’s code introduced in mid-May had lain dormant until Tuesday morning, according to Nick Rockwell, the company’s head of engineering and infrastructure. When the unnamed customer updated their settings, it triggered the flaw, which ultimately took down 85% of the company’s network.
“On May 12, we began a software deployment that introduced a bug that could be triggered by a specific customer configuration under specific circumstances,” Rockwell said. “Early June 8, a customer pushed a valid configuration change that included the specific circumstances that triggered the bug, which caused 85% of our network to return errors.
“We detected the disruption within one minute, then identified and isolated the cause, and disabled the configuration. Within 49 minutes, 95% of our network was operating as normal.”
Rockwell added: “Even though there were specific conditions that triggered this outage, we should have anticipated it. We provide mission-critical services, and we treat any action that can cause service issues with the utmost sensitivity and priority. We apologize to our customers and those who rely on them for the outage and sincerely thank the community for its support.”
The content delivery network (CDN) operated by Fastly is one of the largest on the internet, along with similar networks operated by Akamai, Cloudflare and Amazon’s CloudFront. All operate on the same principle: that the internet is faster and more stable if users can connect to servers physically close to them, optimised for handling lots of traffic.
In typical times, doing so not only cuts loading times but also allows the CDN operators, with expertise in running internet infrastructure, to take on the burden of handling security threats, unexpected traffic spikes, and high bandwidth bills. But the outage highlighted the risks associated with a concentration of critical internet infrastructure in the hands of just a few companies.
Counterintuitively, the outage and recovery led to a rise in Fastly’s stock price, which was up 12% over the course of Tuesday. The increase may have been because the company had demonstrated an effective incident response plan, or simply because the outage had served to make investors more aware of the scale of the Fastly’s business and the size of its customer base.
The effects will not have been quite so rosy for Fastly’s customers. At Amazon alone, for instance, the outage could have lost the company $32m in sales, according to a calculation by the SEO agency Reboot.
“Although it seems they weren’t down for long, the impact it would have had will be huge, especially on e-commerce sites,” said Naomi Aharony, the agency’s managing director. “With our research estimating Amazon could have potentially lost $6,803 every second it was down, it’s clear an investigation will want to be made to find out what happened.”
Few Fastly customers were able to switch over to a backup system in time to recover from the outage, in part because doing so is typically considered more high-risk than simply waiting for the provider to fix problems. For instance, according to public documents, gov.uk has a backup contract with Amazon to provide CDN services, but requires a manual intervention to make the change.