Not a moment after the virtual ink dried on my recent blog post pointing out that the more mature Hadoop becomes, the less of a Big Data tool it is, I received news that cemented this counterintuitive statement. I had a conversation at re:Invent today with SyncSort, an old guard mainframe ETL vendor who has leveraged their deep expertise in sorting algorithms to revamp the inner workings of MapReduce for the new Hadoop 2 release. During the conversation, however, they mentioned a research project they’re working on: Hadoop on zLinux.
zLinux, of course, is Linux running on IBM zSeries mainframes. Linux on the mainframe has been around for a few years now, but the notion of running Hadoop on it is a glorious and unexpected mixing of old and new. The advantages, however, are very real.
Mainframes are still the workhorse of the enterprise data center. They are blisteringly fast and remarkably reliable. Those qualities, however, are not the whole story.
Perhaps the greatest bottleneck in any large scale Hadoop deployment is the local network. The Hadoop File System (HDFS) contains dozens or even hundreds of nodes, and all your Big Data must get onto them somehow.
However, when you put the HDFS on zLinux, then all those nodes are on the same physical server, no larger than a refrigerator. The mainframe’s internal backplane handles the traffic to the nodes and between the nodes, lightening the load on the network. Hadoop on steroids.
Hadoop on zLinux isn’t ready for prime time yet, but once it is, expect to see Hadoop on mainframes, no matter how strange that sounds. Use the right tool for the job, after all.