If industry insiders are correct, 2012 should see an increasing number of vendors and enterprises launching big data initiatives. Many of those projects will involve Apache Hadoop, an open source technology that makes it possible — and economical — for enterprises to store large amounts of diverse data on clusters of standard servers and to analyze that data very quickly.
Hadoop has been around since 2006, but it’s really been gaining attention in the past year or so. “2011 was kind of the year where a critical mass of enterprise customers and vendors kind of began to realize the opportunity and value behind the Hadoop phenomenon,” noted Shaun Connolly, VP of corporate strategy for Hortonworks, one of the key contributors to Hadoop. “I totally expect the trend to continue in 2012.” Connolly added that Hortonworks believes “that by 2015, more than half the world’s data will be processed by Apache Hadoop.”
That prediction has big implications for enterprises, for vendors and for developers working on big data projects.
Hadoop No Longer a ‘Science Project’
Many enterprises have already begun experimenting with Hadoop in small ways, but analysts say this could be the year they begin to get serious about the technology. Benjamin Woo, program vice president for worldwide storage systems at IDC, noted that until now most companies have been approaching Hadoop as a “science project.” However, Woo said, “We believe will happen this year is that there will be enterprise acceptance of Hadoop.”
What’s driving this enterprise rush to Hadoop? The opportunity to make money.
“Google showed us that you can build a large, profitable, fast-growing business entirely out of data. Apache Hadoop represents the opportunity for businesses of all stripes to apply those same technologies and techniques to unlock new value from the under-utilized asset that is their data,” explained Charles Zedlewski, VP of product at Cloudera. “It turns out everyone has big data.”
Connolly pointed out that while enterprises store a lot of data, “75 percent of the data that flows through enterprises isn’t stored.” Because Hadoop makes it economically feasible to store much more of that data, “arguably there is now a whole long tail of data that can be stored and farmed for extreme value,” he added. “Technology aside, economics are a big factor in this.”
According to market research firm Gartner, “Worldwide information volume is growing annually at a minimum rate of 59 percent annually, and while volume is a significant challenge in managing big data, business and IT leaders must focus on information volume, variety and velocity.”
Those three Vs — volume, variety, and velocity — explain the appeal of Hadoop. It can deal with large volumes of data. It can handle a variety of data from widely different sources. And it can analyze that data quickly, enabling business leaders to rapidly changing conditions.
They may not be sure exactly how they will use their data, but enterprises are betting that they’ll be able to make money by analyzing it.
Hadoop’s Place in a Crowded Big Data Market
Of course, enterprises aren’t the only ones hoping to make money from big data. Numerous vendors have launched Hadoop-related products and services. In fact, Woo said, “We’ve identified almost 200 companies in the big data space.”
With so many players in the market, it’s easy to see that not all of them will flourish. IDC has predicted that this year will see a lot of merger and acquisition activity as large technology companies rush to buy smaller companies with expertise in big data. By 2015, the analysts say it’s likely that none of the current “major players” in the Hadoop market will still exist.
Hadoop: A Solution in Search of a Problem
While enterprises and vendors alike seem to be sure that Hadoop is the answer, analysts wonder if they know what the question is. Woo likens it to the game show Jeopardy. “Often we find solutions and then we have to go back and find problems that they solve,” he said.
In a webinar, Woo and a colleague noted, “Many initial Hadoop projects will fail to gain broad adoption. A key challenge for many is to find initial use cases that would deliver measurable value for the enterprise.” They added, “Some of the development side are committed to the solution before they have found a suitable problem.”
Connolly acknowledged the same problem and said that he expects vendors to begin developing more use cases that they can show to potential customers. “That’s important because right now people are really trying to wrap their heads around exactly what kind of value can they get out of Hadoop,” he said. “They understand it has potential, but they want to know where they can start.”
Beware Hadoop Project Failure
Analysts say that enterprises should investigate Hadoop, but to expect some bumps on the road. In a study of early Hadoop adopters, Forrester Research concluded, “Although these early adopters have realized significant benefits, they acknowledge that Hadoop is an immature technology with many moving parts that are neither robust nor well integrated. Deploying, ramping up, and optimizing Hadoop clusters takes more time and custom coding than business process and application development and delivery (AD&D) professionals might expect.”
IDC pointed out that most enterprises don’t have highly skilled data scientists on staff who can help them design projects that will generate real business value. Because of this staff shortage, the immaturity of the tools, and the lack of well-developed business cases, Woo said that “Many of these [Hadoop] projects will fail, unfortunately.”
However, that doesn’t mean that developers shouldn’t investigate the technology. Forrester recommended, “Application development and delivery (AD&D) professionals should consider Hadoop an immature but promising technology for addressing the most data-intensive analytics and application requirements.”