Major internet outage ‘shows infrastructure needs urgent fixing’

Experts say outage shows internet services too centralised and lack resilience

One of the world’s biggest web outages should act as a “wake-up call” that internet infrastructure has become dangerously over-centralised and lacks resilience, security experts have warned.

An unexplained configuration error at a single infrastructure provider, Fastly, which handles 10% of the world’s internet traffic, was enough to render major websites and services inoperable for almost an hour on Tuesday morning.

Online businesses including Reddit, Amazon, Twitch, Spotify and Hulu were knocked offline, as was the Guardian’s website, the BBC, the New York Times, and CNN. National governments were also caught up: gov.uk was unavailable, making a host of government services inaccessible include the Covid vaccines booking site, as was the website for the White House.

The affected sites all used Fastly as a content delivery network (CDN), a service intended to provide greater reliability and performance for heavily trafficked websites.

A CDN is a global network of servers, placed so that at least one server is close enough for a fast connection wherever a user lives. Customers like the Guardian send visitors to the CDN rather than their own servers, providing the content faster and protecting the website from being overloaded in the event of a spike in traffic.

But a CDN can also serve as a single point of failure: if the network collapses, it can also block all traffic going to the websites it protects. CDNs are more efficient the larger they are, creating a concentration of power to the market.

graphic

The vast majority of internet traffic is routed through a handful of CDNs, such as Fastly, Cloudflare, Akamai or Amazon’s CloudFront. David Warburton, of the cybersecurity company F5 Labs, said centralisation is relatively new in the history of the internet and is likely to continue to cause problems.

“The web as a whole was intended to be decentralised,” he said. “By not relying on any one central system, it meant that many different components could fail and internet traffic could still find a way to get where it needed to go. What we’ve seen over the past decade, however, is the unintentional centralisation of many core services through large cloud solution providers like infrastructure vendors and CDNs.”

Paddy McGuinness, who was deputy national security adviser responsible for intelligence security and resilience between 2014 and 2018, said the outage should be considered “a wake-up call” and politicians needed to broaden the existing security-driven approach as technology brings new services to the British public.

“We need resilience as an explicit policy goal, especially on the new networks we are building to deliver services to the citizen,” said the former Whitehall insider, who worked under two prime ministers, David Cameron and Theresa May. “A ‘secure by design and default’ mantra is welcome but it isn’t enough in itself.”

The intelligence agencies GCHQ and its cybersecurity arm the NCSC (National Cyber Security Centre) working alone “could not prevent disruption”, McGuinness argued, partly because a key part of their remit was to detect and prevent hostile state and hacker attacks, rather than ensuring the long-term stability of critical consumer services.

The cost of such an outage can be enormous. In 2015, when the scale of the internet economy was a fraction of today’s, the cost of cloud service outages were estimated at almost $300m (about £210m) a year, says Prof Rebecca Parry, of Nottingham Law School. “Liability for loss of service will probably be covered by the ‘service level agreement’ with customers of paid-for cloud services,” Parry said, “but the agreements will typically not cover all losses sustained.”

A typical Fastly customer is unlikely to receive more than $1,000 in refunded fees for the outage, those with knowledge of the company’s “service level agreements” say. But their true costs could be hundreds of times that, says Chris Huggett, of Sungard Availability Services. “With the average cost of downtime now $250,000 an hour, every minute counts.”

In November 2020, AWS, Amazon’s cloud-hosting arm, suffered a multi-hour outage in the middle of the US west coast’s afternoon. The collapse in the service, which interacts with about 40% of the entire internet, took out sites and services including 1Password, Flickr, iRobot, and the Washington Post.

Months earlier, a failure at Cloudflare, another CDN like Fastly, had rendered much of the web inoperable. That was traced to a single error in a physical link between datacentres in Newark and Chicago, which spiralled into an outage that took almost two hours to fix fully.

Warburton said following the Fastly outage on Tuesday: “In a traditional internet app deployment model, an outage of a server or misconfigured application might take out a single website. As we saw today, similar problems with a cloud solution provider can end up taking out all of their customers, resulting in not one website being taken offline, but hundreds or thousands. The impact can affect organisations’ digital experiences, revenues and reputations.

“The ‘re-centralisation’ of the internet through these cloud solutions is now causing the very problems the original design of the internet was intended to avoid through redundancy. It’s important we consider an approach that moves us away from single points of failure or we will likely see more issues like we did today.”

• This article was amended on 9 and 10 June 2021 to clarify that Chris Huggett was speaking about the costs of the Fastly outage, not the likely refunds as suggested in an earlier version. Also, Akamai was added as one of the main CDNs, and an incorrect reference to the outage being an “attack” was changed.

Contributors

Alex Hern and Dan Sabbagh

The GuardianTramp

Related Content

Article image
Edward Snowden urges professionals to encrypt client communications

Exclusive: Whistleblower says NSA revelations mean those with duty to protect confidentiality must urgently upgrade security

Alan Rusbridger and Ewen MacAskill

17, Jul, 2014 @4:14 PM

Article image
Police super-database prompts Liberty warning on privacy
Human rights group boycotts Home Office consultations on vast cloud system, saying they are a sham

Vikram Dodd Police and crime correspondent

01, Oct, 2018 @5:00 AM

Article image
Typo blamed for Amazon's internet-crippling outage
Human error downed sites and services reliant on AWS, as engineer trying to fix billing issues took out far more than they intended to with errant command

Samuel Gibbs

03, Mar, 2017 @11:12 AM

Article image
Britons increasingly fearful of internet risks, Ofcom research shows
Support for regulation grows as 78% express concern over harmful experiences

Jim Waterson Media editor

29, May, 2019 @11:01 PM

Article image
Technology firms to spend $150bn on building new data centres
Google in latest results, revealed that it invested $1.6bn on building data centres just in the three months from April to June

Charles Arthur

23, Aug, 2013 @7:31 PM

Article image
Kamala Harris to call for urgent action on AI threat to democracy and privacy
US vice-president to say short-term problems with technology as pressing as existential ones, before UK summit

Patrick Wintour Diplomatic editor

01, Nov, 2023 @9:46 AM

Article image
Internet infrastructure 'needs updating or more blackouts will happen'

As more devices are used to surf the web, 'Border Gateway Protocol' pathways are too numerous for older routers to handle

• Is the internet full and going to shut down?

Juliette Garside and Samuel Gibbs

14, Aug, 2014 @9:22 PM

Article image
Amazon outage hits Quora and others

Outage at Amazon's cloud computing centre has hit sites including Quora, Reddit and FourSquare. By Charles Arthur

Charles Arthur

21, Apr, 2011 @1:55 PM

Article image
Google outage hits Gmail, Snapchat and Nest
Company investigating after Cloud Platform problem causes email delivery failures

Alex Hern

08, Apr, 2020 @4:39 PM

Article image
Violent online content ‘unavoidable’ for UK children, Ofcom finds
Every child interviewed by media watchdog had watched violent material on the internet

Alex Hern UK technology editor

15, Mar, 2024 @12:01 AM