This morning's breaking news on CNN: Under a legal PATRIOT Act court order, Verizon has been providing American citizens' phone call information to the National Security Agency. The information in question: "originating and terminating telephone numbers as well as the location, time and duration of the calls," but not the contents of conversations themselves.
In other words, the NSA wants the call metadata, not the data.
I find two facts about this point remarkable. First, veteran CNN reporter Candy Crowley actually used the word metadata on air, and furthermore, used it correctly. It's rare for any national news service to get any technical term right; I always roll my eyes when CNN defines Cloud Computing as "accessing software over the Internet," for example.
But the second point is that in some cases, the metadata are even more important than the data. You might think this fact is true because in this case, the data (the call content) would be too massive for the NSA to handle. Sorry to burst your bubble, folks: if the NSA wanted all the call content from all of Verizon's calls, they have the Big Data chops to process such data sets. In fact, they process call contents all the time. But in this case, they were after only the metadata, a Big Data problem in their own right.
Techies tend to think of metadata as the second class citizens of the data world, taking a backseat to the almighty data. But in the world of Big Data, heads up! Here come Big Metadata.
metadata, big data