A few evening ago it was reported that Amazon Web Services (AWS) was experiencing a glitch and that was affecting a significant number of companies that rely on the cloud hosting servers. We wrote about it here.
And in a seriously detailed blog post, which you can find here, AWS has explained what the problem was. In short: human error and a dodgy command. It’s rather heartening that even at a company like AWS, which is world class and hugely successful and profitable, can still be at home to Mister Cock Up.
As they say: “We’d like to give you some additional information about the service disruption that occurred in the Northern Virginia (US-EAST-1) Region on the morning of February 28th. The Amazon Simple Storage Service (S3) team was debugging an issue causing the S3 billing system to progress more slowly than expected. At 9:37AM PST, an authorized S3 team member using an established playbook executed a command which was intended to remove a small number of servers for one of the S3 subsystems that is used by the S3 billing process. Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.”
Lots of companies were affected by the problem and that’s hardly surprising because AWS supports firms such as Netflix, Spotify, Pinterest and Buzzfeed. But it is to AWS’s credit that they have described what the problem was in such detail. Although doubtless plenty of organisations will be wondering whether they need a back-up plan for possible future problems.