OK all you techies, time to go back to English class. The word “data” is the plural of “datum.” If you use either word improperly, I’ll hit your knuckles with a ruler, so pay attention.
Wrong: This data is important.
Right: These data are important.
Wrong: Big data is useful when we use it properly.
Right: Big data are useful when we use them properly.
Wrong: Which piece of data am I looking at?
Right: Which datum am I looking at?
And so on. Now, before you freak out and realize your entire existence has been meaningless up to this point because you never saw a datum in your life, relax. Most language experts admit that correctness follows common usage, and since people commonly use the word “data” as though it were singular, that means it’s OK to do so. So carry on, you wretched English destroyers, you.
Common usage or not, treating “data” as plural is still correct. It’s up to you whether you wish your language to be correct, and presumably, as long as people understand you then it doesn’t matter in many situations. But in other situations, it’s important to be correct – or at least to know what is correct, so that if you break the rules, you do so intentionally.
In my writing, I predictably use the word “data” quite frequently, and I endeavor to use it properly every time. And while correctness is important to me, I’m willing to break rules when I feel like it. After all, the previous sentence began with “and,” now didn’t it? In the case of “data,” however, I stick to the rule book for a particular reason.
Data, you see, are inherently plural. When we have a data set, we have a set of many things, not just one thing. In many cases those data are varied and diverse. Especially in today’s big data world, our data are likely to be quite heterogeneous. Referring to them in the plural, therefore, emphasizes both the diversity and the discreteness of our data.
“Information,” however, is a collective noun. We cannot count our information the way we can count our data. We never say “informations” – and for good reason. Information is fluid. It’s difficult to quantify, unless we break it down into data first. And most importantly, information depends upon the recipient: data only become information if there is at least potentially a person on the receiving end that can understand it. Otherwise they’re just noise.
Each datum can be thought of as made up of individual bits or bytes, concrete units that we can count, move, and calculate with. Information, in contrast, must inform – an essential abstraction of the data that brings humans into the loop. Emphasizing this distinction is why I always treat “data” as plural. Break this rule if you wish, but remember, I still have my ruler.