Missing Data Can Be Meaningful
No matter how big the Big Data gets, we will never know everything about everything. Well, according to the super-duper computer called "Deep Thought" in the movie "The Hitchhiker's Guide to the Galaxy" (don't bother to watch it if you don't care for the British sense of humour), the answer to "The Ultimate Question of Life, the Universe, and Everything" is "42." Coincidentally, that is also my favorite number to bet on (I have my reasons), but I highly doubt that even that huge fictitious computer with unlimited access to "everything" provided that numeric answer with conviction after 7½ million years of computing and checking. At best, that "42" is an estimated figure of a sort, based on some fancy algorithm. And in the movie, even Deep Thought pointed out that "the answer is meaningless, because the beings who instructed it never actually knew what the Question was." Ha! Isn't that what I have been saying all along? For any type of analytics to be meaningful, one must properly define the question first. And what to do with the answer that comes out of an algorithm is entirely up to us humans, or in the business world, the decision-makers. (Who are probably human.)
Analytics is about making the best of what we know. Good analysts do not wait for a perfect dataset (it will never come by, anyway). And businesspeople have no patience to wait for anything. Big Data is big because we digitize everything, and everything that is digitized is stored somewhere in forms of data. For example, even if we collect mobile device usage data from just pockets of the population with certain brands of mobile services in a particular area, the sheer size of the resultant dataset becomes really big, really fast. And most unstructured databases are designed to collect and store what is known. If you flip that around to see if you know every little behavior through mobile devices for "everyone," you will be shocked to see how small the size of the population associated with meaningful data really is. Let's imagine that we can describe human beings with 1,000 variables coming from all sorts of sources, out of 200 million people. How many would have even 10 percent of the 1,000 variables filled with some useful information? Not many, and definitely not 100 percent. Well, we have more data than ever in the history of mankind, but still not for every case for everyone.
Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at email@example.com.