Hadoop, it seems, is everywhere these days. IBM, Oracle and Yahoo are among the big guns that have been supporting Hadoop for years. Recently, Microsoft joined the club by announcing it will integrate Hadoop into its upcoming SQL Server 2012 release and Azure platforms.
Apache Hadoop — to give Hadoop its proper name — is a software framework that supports data-intensive distributed applications (think Big Data). The framework, written in Java and supported by the Apache Software Foundation, enables applications to work with thousands of nodes and petabytes of data.
Microsoft’s embracing of Hadoop is proof that the vendor has seen the writing on the wall about big data — namely that it must give customers and developers the tools they need (be they proprietary or open-source) to work with all kinds of big data.
“The next frontier is all about uniting the power of the cloud with the power of data to gain insights that simply weren’t possible even just a few years ago,” said Microsoft Corporate Vice President Ted Kummert in a statement. “Microsoft is committed to making this possible for every organization, and it begins with SQL Server 2012.”
As part of its commitment to help customers and developers process “any data, any size, anywhere,” Microsoft is working with the Hadoop ecosystem, including core contributors from Hortonworks, to deliver Hadoop-based distributions for Windows Server and Windows Azure that work with industry-leading business intelligence tools.
A Community Technology Preview (CTP) of the Hadoop-based service for Windows Azure will be available by the end of 2011, and a CTP of the Hadoop-based service for Windows Server will follow in 2012.
Microsoft said it will work closely with the Hadoop community and propose contributions back to the Apache Software Foundation and the Hadoop project.
Why is adding a fraction of the Microsoft Windows, Azure and SQL Server user bases to the Hadoop community a good thing for Apache Hadoop, asked Eric Baldeschwieler, the CEO of Hortonworks in a recent blog post.
“Microsoft technology is used broadly across enterprises today. Ultimately, open source is all about community building. A growing user community feeds a virtuous circle. More users means more visibility for the project… More users mean more folks who will ultimately become contributors or committers. This makes the code evolve more quickly, which allows it to satisfy more use cases and hence attract more users, which further drives the project forward.”
Hadoop and Managing Big Data in the Enterprise
Managing big data is one of the key challenges of the new decade, said David Menninger, vice-president and research director, Ventana Research.
“The solutions to this challenge vary, but interest in them seems to be universal,” he said. “The largest database vendors and others that wish to compete with them are developing or acquiring various technologies, among them database appliances, massively parallel databases, and columnar databases.”
Menninger noted that the rise of Hadoop has been dramatic. “It has been successfully deployed at some of the largest Internet-based organizations in the world, including eBay, Facebook, Google and Yahoo,” he said. “Seeing this, other organizations whose business depends on managing large amounts of data have begun to explore Hadoop as well.”
In the summer of 2011, Ventana Research released a report on Hadoop that found many organizations are using the platform to perform data mining and in-depth analytics. The report, based on a survey on enterprise adoption of Hadoop, was sponsored by Cloudera, Karmasphere and Pervasive Software.
Among many interesting trends and statistics, Ventana discovered:
- More than one-half (54%) of organizations surveyed are using or considering Hadoop for large-scale data-processing needs.
- More than twice as many Hadoop users report being able to create new products and services and enjoy costs savings beyond those using other platforms; over 82% benefit from faster analyses and better utilization of computing resources.
- 87% of Hadoop users are performing or planning new types of analyses with large-scale data.
- 94% of Hadoop users perform analytics on large volumes of data not possible before; 88% analyze data in greater detail; while 82% can now retain more of their data.
- Organizations use Hadoop in particular to work with unstructured data such as logs and event data (63%).
- More than two-thirds of Hadoop users perform advanced analysis — data mining or algorithm development and testing.