Freeform Data Are Not Exactly Free
That type of "Buyer-centric" mindset has been the main theme of this whole series (refer to "It's All About Ranking"). And for this daunting task of having to categorize millions of SKU's, that single-mindedness also became our savior. Simply, why categorize any item that did not sell? In fact, we could explain the majority of the transactions by categorizing the most popular items first and ignoring the unpopular ones.
If we were doing it today, we would have put more emphasis on crowdsourcing, pattern recognitions and machine learning. In fact, what we did even back then was a combination of small-scale crowdsourcing (with lots of part-time moms who were highly educated and informed consumers), plus pattern-recognition techniques. We tried machine learning in the beginning, but quickly realized that it would be humans who would have to teach the machine to begin with. And, in those days, such software was cost-prohibitive, with not-so-great results. They were alright with long strings of texts from emails and messages, but with a burst of product description like "Disney's Tarzan," it had no idea where to go and couldn't be taught, either.
That means we would still need some type of human intervention at the beginning stage, and I think it would be beneficial to share some of the major rules of categorization, regardless of employed technologies or techniques:
1. Define the Categories First: The key is to set up categories that fit your goals, granted that you've set the goals. Be specific, as we can combine categories in analytical steps, but analysts cannot break apart the ones that are lumped in together.
2. Categorize as the Data Are Being Collected: It is not always possible, but we should try to categorize data at earlier stages of data collection. For example, inadequate surveys and data-entry forms on Web pages are the main source of unusable freeform data. And be consistent about it during the journey through data, starting with data collection forms to database design and analytics.
Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at email@example.com.