Freeform Data Are Not Exactly Free
And those numbers and figures are the easy part. Most of the data we've been dealing with so far—through computers, at least—has been in forms of numbers, anyway. Other than binning or transforming them and dealing with inevitable missing values (more on that in future articles), numbers are generally in ready-for-analytics forms. The real trouble is that most of the so-called Big Data are not in such numeric shapes at all. It gets worse as most of the data that we are collecting nowadays are unstructured, unorganized, unedited, uncategorized and unrefined. In other words, they are freeform raw data. That means someone—generally the last person who touches the data for reporting to some big shots—has to make sense out of them. And how are we going to do that when Mr. Data of the 24th Century a has hard time understanding sarcasm? (But then, one humanoid, Sheldon Cooper of this century, has a hard time with sarcasm too. So let's not be too harsh on machines for such shortcomings.) Most big shots are bottom line-oriented folks, and all they really care about are short bursts of answers like "percentage increase of unfavorable sentiments toward a new product that was just released with lots of development chagrins and marketing fanfare."
One thing is for sure. While it requires hard work, filtering and categorizing non-numeric data through computer-based analysis or mostly through hard labor (or both), the hard work certainly pays off. I said numerous times that the Big Data movement must be:
1. Cutting down the noise; and
2. Providing answers to decision-makers in the form of simple answers (refer to "Big Data Must Get Smaller").
Combing through the freeform data and throwing away the unnecessary bits is the beginning of such a data reduction process. Here again, the first part is defining the questions to be answered. Once the goals are set, we can start throwing things away with conviction. If we find patterns in such activities, we can then start automating the hygiene and data categorization processes.
Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at email@example.com.