If you were having difficulty accessing your favorite website last Tuesday morning, you’re not alone. A jaw-dropping number of major websites around the globe suddenly became unavailable with no immediately obvious explanation — before reappearing an hour later.
It’s disconcerting when the sites we rely on suddenly become inaccessible, and even more so when it happens on such a vast scale. This outage saw seemingly unrelated sites go dark, including the BBC, Pinterest, the Financial Times, Reddit and even The Conversation.
How can so many sites, from so many different organizations, all be affected by the same incident? To understand the answer, you need to know what a CDN (content delivery network) is and how crucial they are to the smooth running of the internet.
What happened and what’s a CDN?
While it’s too early to provide a comprehensive diagnosis of the incident, the internet (once it was accessible again) quickly pointed to the culprit: Fastly.
Fastly is a cloud computing company that provides CDN services to a range of websites including Amazon and Deliveroo. But how can a single company bring down a noticeable proportion of the internet?
When we access a website, we might assume our browser goes off to the internet, talks to the remote site, and then presents the page on our screen. While this is in essence what happens, it masks a much more complicated process, which can include CDN services.
A CDN is a service that allows popular websites to keep copies of their pages closer to their customers.
For example, if someone in Australia wants to browse the BBC website, we could talk directly to a server in the United Kingdom. While the internet is perfectly capable of transferring the web page from the U.K. to Australia, there is an inevitable delay (perhaps a few hundred milliseconds). And nobody likes delays.
The experience for the user can be up to 10 times quicker if a copy of the page (or elements of its content) can be held in Australia and delivered on demand. Of course, accessing a version of the page held in Australia would work great if you’re in Australia but not so much if you’re in, say, Los Angeles. So, to ensure fast content delivery for everyone around the world, CDNs usually work on a global scale.
A CDN service provider will typically operate data centers around the world, holding copies of popular content in major population centers to deliver content in each region. The speed of delivery of a single image or page element may not be noticeably faster coming from a CDN — the difference between 200 milliseconds and 20 milliseconds isn’t discernible to most users.
However, modern websites often contain many elements, including images, videos and so on. When combined, the speed improvement through CDNs can be significant.
So, why did so many sites fail?
CDN services provide a valuable service to improve our web browsing experience — but at a cost.
When a major CDN provider such as Fastly experiences a failure, it doesn’t affect just one website; it’s likely to impact every website they support. In Tuesday’s example, sites across the world suddenly went offline as requests for the CDN-hosted content were not serviced.
This incident demonstrates how reliant we are on technology — and on the specific implementations of technology in our modern lives. If each website we visit hosted its own content exclusively, we would not be facing these issues. However, our web browsing experience would be much slower, reminiscent of the days of dial-up modems (well, perhaps not quite that bad).
Despite the global outage, it was resolved within about an hour. That would seem to indicate it’s unlikely to have been a security- or hacking-related issue. It was more likely due to a short-term failure in Fastly’s infrastructure, or a misconfiguration that spread through its systems.
Could it happen again?
Fastly is not the only CDN provider. Other high-profile services include Akamai and Cloudflare. Outages are not uncommon, but they are usually short-lived.
Readers can be assured (assuming you haven’t lost internet again) that service providers are closely watching this incident to ensure lessons are learned for next time.-30-