The Big Data Long Tail

The Big Data Long Tail

Time to check the Web for some definitions:

  • Big Data: a massive volume of both structured and unstructured data that is so large that it’s difficult to process using traditional database and software techniques.
  • The Long Tail: the large number of occurrences far from the ‘head’ or central part of a distribution of popularities, probabilities or such.

Take these two buzz phrases and put them together, and what do you get?

I ran into a discussion of the Big Data Long Tail at a recent conference, where scientists were discussing their Big Data challenges and successes. In data-centric sciences, say genomics or astronomy for example, they routinely deal with the ‘head’ of Big Data: large data sets that they intentionally collect, store, manage, and analyze. But more difficult to analyze, but every bit a Big Data problem nevertheless, is the Long Tail of Big Data: data on individual researchers’ laptops or other systems scattered about in offices, under desks and in laptop bags.

Taken individually, the data on these systems are important but not plentiful enough to be Big Data. But consider all such systems en masse, and now you have an especially knotty Big Data problem. Not only are data formats a jumble, but so are your metadata, not to mention the challenge of constructing and running algorithms across all these various systems scattered potentially around the world.

The business world faces its own Big Data Long Tail problems as well. Yes, you may have your large data sets that you intentionally collect and analyze for business purposes, whether you be analyzing customer purchasing behavior, the movement of the stock market, or whatever Big Data are important to your business. But what about the data your business keeps in, say, Excel spreadsheets?

Virtually every computer in your enterprise has a passel of spreadsheets on it. People upload some to your portal, while other spreadsheets remain hidden away on individual’s laptops. Is there value in those spreadsheets? Indubitably. So, what would it take to consider all the spreadsheets everywhere in your organization as a single Big Data set, so that you can gain intelligence from such a collection? Now we’re talking a true Big Data problem: a problem today’s tools are woefully inadequate to solve.

Share the Post:
Heading photo, Metadata.

What is Metadata?

What is metadata? Well, It’s an odd concept to wrap your head around. Metadata is essentially the secondary layer of data that tracks details about the “regular” data. The regular

XDR solutions

The Benefits of Using XDR Solutions

Cybercriminals constantly adapt their strategies, developing newer, more powerful, and intelligent ways to attack your network. Since security professionals must innovate as well, more conventional endpoint detection solutions have evolved

AI is revolutionizing fraud detection

How AI is Revolutionizing Fraud Detection

Artificial intelligence – commonly known as AI – means a form of technology with multiple uses. As a result, it has become extremely valuable to a number of businesses across

AI innovation

Companies Leading AI Innovation in 2023

Artificial intelligence (AI) has been transforming industries and revolutionizing business operations. AI’s potential to enhance efficiency and productivity has become crucial to many businesses. As we move into 2023, several