Lies, Damn Lies, and Big Data Analytics

Lies, Damn Lies, and Big Data Analytics

The goal of Big Data analytics is to mine those valuable nuggets of truth from an immense heap of noise. If only we pile the miscellaneous data high enough and deep enough, the reasoning goes, and we have tools that can separate the gold from the rocks, then we’ll end up with pure, unvarnished truth at the end.

Good luck with that.

There are many reasons why even the best Big Data algorithm might lead to biased, incomplete, misleading, or downright incorrect conclusions. Here are some examples.

Your original data sets may not be properly representative. Does mining Tweets give you a clear idea of what your customers think? Probably not, because believe it or not, not everybody uses Twitter. And the population of Twitter users skews young and tech-savvy.

Your assumptions may be coloring your algorithms. In other words, you may be suffering from wishful thinking. Look at how the Republicans miscalled the 2012 Presidential election for an example of this mistake.

You focus only on the easier analyses. Perhaps the greatest challenge facing Big Data analytics is dealing with a diverse range of data types and structures: relational, columnar, text-based, audio, video, etc. Dealing with some of these types is easier than others, so you’re probably better at the easier tasks and do more of them.

The unknown unknowns are still elusive. It’s hard to come up with an answer when you have no idea what the question is. Sure, sometimes your number crunching yields a thoroughly surprising result, but rarely does it uncover an answer that no one had a clue they were looking for.

Certain data have a way of hiding behind other data. Anybody with a common name, or worse yet, the same name as a celebrity has run into this problem. If your name happens to be “Justin Bieber,” but you’re not the fast-driving, monkey-abandoning tween heartthrob, then good luck Googling yourself. The same thing can happen with your Big Data analytics as well.

The bottom line: Even with Big Data, bigger doesn’t mean better. Apply liberal doses of common sense, and take any result with a mine full of salt.


About Our Editorial Process

At DevX, we’re dedicated to tech entrepreneurship. Our team closely follows industry shifts, new products, AI breakthroughs, technology trends, and funding announcements. Articles undergo thorough editing to ensure accuracy and clarity, reflecting DevX’s style and supporting entrepreneurs in the tech sphere.

See our full editorial policy.

About Our Journalist