A recent blog post by fellow Cloud curmudgeon Ben Kepes shined the light on a few of the unsightly ingredients in the Amazon Web Services (AWS) Cloud sausage. Apparently, many AWS services depend upon Elastic Block Storage (EBS) solely in their first Cloud data center region, US-East-1. And while AWS generally touts a horizontally distributed, Cloud friendly architecture, this unfortunate dependence on EBS is a single point of failure. Oops.
Amazon is unlikely to provide a full explanation of this architectural faux pas, but Kepes surmises that the problem is that US-East-1 is their oldest data center, and thus Amazon hadn’t really worked out how best to architect their Cloud when they set it up. In other words, US-East-1 has a serious legacy problem.
If the problem simply centered on the technology, then this issue would be minimal. After all, tech refreshes are commonplace in today’s IT environments, and Amazon surely instituted a plan to replace and update aging technology long before AWS was a twinkle in the ecommerce bookseller’s eye. But the EBS single point of failure problem isn’t a legacy technology problem at all. It’s a legacy architecture problem.
Unfortunately for Amazon (and for its thousands of customers), legacy architecture challenges are extraordinarily difficult to resolve, even in the enterprise IT context. But Cloud Computing raises the stakes. The Cloud provider context layers complexity on this already intractable issue, because customers’ Cloud architectures leverage Amazon’s internal architecture. Changing AWS’s inner workings might have a cascading ripple effect across its customers’ architectures, which would be a catastrophe eclipsing the bad publicity from any downtime that might result from the EBS single point of failure.
Amazon has a serious problem on their hands. They must proceed with extreme caution. Will they be able to fix their architecture without bringing down the AWS house of cards? Perhaps. My prediction is that they will eventually fix this legacy architecture issue with minimal customer downtime – but not without any downtime. The question for you is: will you be one of the unlucky customers to get caught by Amazon’s legacy architecture?