Lies, Damn Lies, and Big Data Analytics

Lies, Damn Lies, and Big Data Analytics

The goal of Big Data analytics is to mine those valuable nuggets of truth from an immense heap of noise. If only we pile the miscellaneous data high enough and deep enough, the reasoning goes, and we have tools that can separate the gold from the rocks, then we’ll end up with pure, unvarnished truth at the end.

Good luck with that.

There are many reasons why even the best Big Data algorithm might lead to biased, incomplete, misleading, or downright incorrect conclusions. Here are some examples.

Your original data sets may not be properly representative. Does mining Tweets give you a clear idea of what your customers think? Probably not, because believe it or not, not everybody uses Twitter. And the population of Twitter users skews young and tech-savvy.

Your assumptions may be coloring your algorithms. In other words, you may be suffering from wishful thinking. Look at how the Republicans miscalled the 2012 Presidential election for an example of this mistake.

You focus only on the easier analyses. Perhaps the greatest challenge facing Big Data analytics is dealing with a diverse range of data types and structures: relational, columnar, text-based, audio, video, etc. Dealing with some of these types is easier than others, so you’re probably better at the easier tasks and do more of them.

The unknown unknowns are still elusive. It’s hard to come up with an answer when you have no idea what the question is. Sure, sometimes your number crunching yields a thoroughly surprising result, but rarely does it uncover an answer that no one had a clue they were looking for.

Certain data have a way of hiding behind other data. Anybody with a common name, or worse yet, the same name as a celebrity has run into this problem. If your name happens to be “Justin Bieber,” but you’re not the fast-driving, monkey-abandoning tween heartthrob, then good luck Googling yourself. The same thing can happen with your Big Data analytics as well.

The bottom line: Even with Big Data, bigger doesn’t mean better. Apply liberal doses of common sense, and take any result with a mine full of salt.

Share the Post:
Heading photo, Metadata.

What is Metadata?

What is metadata? Well, It’s an odd concept to wrap your head around. Metadata is essentially the secondary layer of data that tracks details about the “regular” data. The regular

XDR solutions

The Benefits of Using XDR Solutions

Cybercriminals constantly adapt their strategies, developing newer, more powerful, and intelligent ways to attack your network. Since security professionals must innovate as well, more conventional endpoint detection solutions have evolved

AI is revolutionizing fraud detection

How AI is Revolutionizing Fraud Detection

Artificial intelligence – commonly known as AI – means a form of technology with multiple uses. As a result, it has become extremely valuable to a number of businesses across

AI innovation

Companies Leading AI Innovation in 2023

Artificial intelligence (AI) has been transforming industries and revolutionizing business operations. AI’s potential to enhance efficiency and productivity has become crucial to many businesses. As we move into 2023, several