How A Command Typo By Amazon Employee Took Down The Internet This Week


Short Bytes: Around 1 PM PST, on Tuesday, many websites and services stopped operating after a widespread outage of Amazon Web Services. Specifically, Amazon observed high error rates with S3 (Simple Storage Service) in US-EAST-1. The service is used by many websites to host their resources. According to the official statement by Amazon, the five-hour outage was caused an incorrectly typed command by an employee.

What happens when too many websites are hosted by the same service provider? Just in case that provider experiences some issues, a large part of the internet goes down. Yes, here I’m talking about Amazon S3 web-based storage service. AWS is mostly used for storing images for a lot of sites, and many of them also use to host entire sites.

his happened on February 28, when Amazon S3 started experiencing “high error rates”, causing chaos among many websites that depended on AWS to work. These sites and services include the likes of Medium, Slack, Quora, Giphy, Nest, etc.

Ironically, Amazon’s own ability to report problems was broken for a while. Many connected devices also weren’t working properly.

Amazon hasn’t outlined any specific fault. At the moment, Amazon claims to have fully recovered from the disruptions. Whatever might be the reason, this situation reminds us of the DDoS attack that disrupted Dyn’s systems.

This outage also tells us how much of the internet is relying on a few companies to keep running. If we are unable to diversify our dependence on online services like Amazon S3, such downtime are bound to happen from time-to-time.

Meanwhile, here’s the latest update from Amazon:

As of 1:49 PM PST, we are fully recovered for operations for adding new objects in S3, which was our last operation showing a high error rate. The Amazon S3 service is operating normally.

Update: Here’s the official statement from Amazon

The Amazon Simple Storage Service (S3) team was debugging an issue causing the S3 billing system to progress more slowly than expected. At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended. The servers that were inadvertently removed supported two other S3 subsystems.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s