Somehow, it’s taken on faith that Big Data will let you capture anything and everything you want, and make sense of it automatically with the magic of technology.
We’ve seen a lot of changes in how businesses approach social data over the past 7 years, but few have had as much of an impact as Big Data. With smart devices and social media we now generate so much data, both intentionally and unwittingly, conventional wisdom assures us that every byte has value and must be captured to unlock the secrets of our next marketing success. While it’s certainly true that knowing more about your market can help you deliver better products and services, I see a lot of companies led astray by Big Data fantasies that will likely never pan out. Even smart companies.
The biggest pitfall of Big Social Data is framing it primarily as a technology problem—as if it’s all about muscling data with terabytes of RAM, petabytes of storage, and dazzling analytics fueled by all the firepower that can be packed into a few large Amazon servers. Today, making sense of unstructured data is the celebrated domain of machine learning engineers and programmers, rendering magic with computational linguistics and NLP. Don’t get me wrong, we rely on a few machine learning geniuses for our own technology, so I have nothing but their praises to sing. But when you render the problem as a technological hurdle, you produce unintended outcomes that can get distracting.
Recently I was on a conference call with a group of engineers discussing a roadblock on a project where they reached the limit of their resources in the face of overwhelming data. They were stuck. They couldn’t store, much less process, the volume of data they were collecting. They threw around some suggestions on optimization, and finally hit on the obvious solution: Collect less data. No one discussed a strategy for discerning what to collect and what to leave behind. No one discussed sampling, or sample sizing, and no one talked about broaching the subject with anyone from the business side. Just collect less data.
I’ve seen the opposite approach at other companies, taking as gospel the? recently expressed strategy of the CIA. We must collect ~everything~ and keep it ~forever~, because somewhere hidden in all that data is a goldmine of future value. Well, most companies don’t have the budget of the CIA. Even Google only indexes a portion of the Web. But somehow, it’s taken on faith that Big Data will let you capture anything and everything you want, and make sense of it automatically with the magic of technology. Oh, and reflexively, that will make you a ton of money.
Missed in all of this enthusiasm for Big Data is the very real power of human processing—something more often shunned than celebrated by investors, because outside of a few brain labor farms, humans just aren’t scalable when it comes to computing. Humans can’t see the semantic patterns that emerge in clusters over a collection of 50 Million documents. Humans can’t process millions of documents to extract the most valuable key phrases in seconds. And of course, humans make mistakes. The problem with this mindset is that it’s a zero-sum game, which humans have already lost: Computers process the data because humans can’t.
The reality is that a partnership between human and machine is an optimal antidote to today’s challenges with Big Data. Rather than approach every new project with an army of web crawlers and analytics, we start each project with humans. We use methodologies to asses the kinds of data available, the meaning of that data to a defined objective, and we use a host of tools to carefully curate what we find and render it into useful instructions for our machines. Then we unleash the crawlers, the analytics, the algorithms and filters.
At each stage of the collection and analytic process, it’s a handoff between human and machine, optimizing what each does best to find and make sense of the data that really matters. Our goal isn’t to eliminate human processing, but to eliminate the human tasks that machines do better. The result may not be “Big Data” the way many engineers and investors think about it, but it’s Smart Data the way our customers think about it—data to drive decision-making today, not some day in the theoretical future.