Perhaps the best part of speaking at Dataversity’s NoSQL Now conference is the opportunity to channel my inner geek. This show brings out the most hardcore data geeks Silicon Valley has to offer, after all, and that’s triple-X hardcore when it comes to data geekdom. My inner geek, however, never goes anywhere without wearing his architect’s hat. Good thing, too, because the challenge with such heavily technical shows is understanding how all this great new gear fits into the big picture of helping enterprises achieve broad-based agility goals.
My architect’s hat perked up during a talk by Nathan Mars, creator of real time Big Data analytics platform Storm. In his talk he called upon the audience to embrace immutability. Forego all DELETEs and UPDATEs on your data. All you get are INSERTs and SELECTs. Furthermore, keep track of which queries generated which data. As a result, you’ve protected yourself from data corruption, because you can always go back in the permanent record to recompute any given result properly.
Immutability is one of those topics that fires up my inner geek, as it crops up in different places. Functional programming requires immutable data, and functional programming is experiencing a resurgence due to its applicability in the Cloud. But that’s not the whole story by any means.
Immutable data have been with us for years, as any application that must keep track of various versions of given information can make use of immutability. In fact, it’s no mistake that Mars’s link is to his GitHub profile, as Git (the technology behind GitHub) is based on immutability. The trick with Git, of course, is efficiency: since changes to checked-in code are incremental, Git has a sophisticated system of creating deltas and snapshots in order to balance efficient use of storage with rapid queries and rollbacks.
The question at this point is why wouldn’t all your enterprise data benefit from the same immutability as Git offers? Shouldn’t you always maintain previous versions of any record, along with a complete trail of everything that happened to that record? After all, any number of human errors may lead to corrupted data. We should always be able to recompute a result based upon accurate historic information.
While Mars focused on real time Big Data analytics, taking advantage of the performance benefits from the proper use of immutable data, there’s a bigger story here. Treating everyday enterprise data (say, customer records, for example) as immutable can turn such information into Big Data. After all, the reason why we weren’t collecting all the deltas and snapshots for all our customer data in the past was because we didn’t have the storage or processing power to deal with all that information. But now such capabilities are just within our reach.
While the conventional approach to Big Data analytics is to crunch massive data sets consisting of mixed data types in order to produce a “small data” result we can make sense out of, the principle of immutability takes existing small data sets and turns them into Big Data sets and then applies Big Data processing techniques to them.
In other words, a whole new way of looking at enterprise data. What would your world look like if you never did an UPDATE or DELETE ever again?