Among the big news in the world of Big Data is the impending release of Hadoop 2, a major refactoring of the popular Big Data processing tool. This release is notable not because it offers plenty of new bells and whistles, but rather because the Hadoop team has cleaned up many of the limitations and inconsistencies in the original Hadoop code.
At the core of the new release is YARN, a new cluster resource management tool that supports and displaces MapReduce: it supports MapReduce as a data processing engine while offloading the cluster resource management. Hadoop 2 is also even more scalable than the previous version, and supports multitenancy – a feature that makes it better suited to run enterprise data warehouses.
And therein lies the irony. Hadoop 2 promises to become the engine that supports data warehouses in enterprises around the world, a better mousetrap for catching traditional, familiar mice. In other words, the better Hadoop gets, the less of a Big Data tool it becomes.
Remember that Big Data are data sets that traditional tools are unable to adequately deal with, necessitating cutting edge technology that takes unconventional approaches. Hadoop version 1 clearly qualified. But now that Hadoop 2 is positioned to dominate the staid, traditional enterprise data warehouse market, it will pass the Big Data moniker to newer, less mature technologies that are emerging to deal with challenges that traditional tools – like Hadoop – are poorly suited to tackle.
Oh, the irony!