Freeform Data Are Not Exactly Free
When everything is digitized, even the food labels can be used to profile consumers and predict future behaviors. Predicting "why" is the most difficult part in predictive analytics. But if some household is buying unusual proportion of products labeled as "Sugar Free," do we really need to know the "why" part? It could be that there is a diabetic in the household, or someone is in a weight loss program. But once such a correlation is found, we can personalize the offers to such households (without, of course, being too creepy about it).
Here again, the whole exercise starts with creating new variables and categories. In the spirit of going nuts about it, let's start by putting down all the things that we can find from simple food labels. The goal here is to describe the buyers, not the product itself. So, let's imagine the buyers of products labeled as:
- Organic (Though I wonder what that means at times. The opposite of Synthetic?)
- Diet (as in "Diet Coke," or "Coke Light" in Europe)
- Low calorie/no calorie
- Low sugar/no sugar
- Low fat/fat free
- Low sodium/sodium free
- Gluten free
- Lactose free
- Peanut free
- Energy (not for bunnies, but for drinks)
- Family size/value pack
- Fun size/small packages (Though I personally believe the "Fun Size" should mean something crazy-big, like an 8x11 size chocolate bar that's ¾-inches thick.)
- Etc., etc.
Once you break down the labels this way, it is entirely possible to build models targeting "Cooking from scratch for a family," "Health-conscious organic," "Weight watchers," "Busy parents with young kids," "Energetic on-the-gos," "Buyers with dietary restrictions," etc. This is how to convert monotonous product labels into descriptors of buyers. Through categories and tags, then with statistical models.
Now you get the idea, so let's continue with ridiculously large data. I have some personal experience with it, as I led a team to create the first large-scale consumer co-op database in the U.S. that fully incorporated SKU-level item data into an individual-level targeting engine at the turn of this century. That may sound easy nowadays (though very few people are doing it right, even now). But being the first in the industry attempting such a thing, it was a borderline crazy idea at the time. And like anything in the age of Big Data, the data collection was the easiest part. In fact, after we created the SKU-level co-op database, item-level data became the price of entry in the co-op and list industry, and every follower started collecting the data at that level. But at the risk of sounding too much like Jerry Seinfeld, hey, anybody can just collect the data. The important part is holding on to them. All the way to modeling, targeting and selection.
Because we were collecting data from more than 1,200 sources then (now, that company has over 2,000 sources, I hear) and participating co-op members each had a number of SKUs ranging from 50,000 to 500,000, by the time we had more than 150 million buyers in our database, we had literally billions of item-level transaction details. Well, that is pretty big—even by today's standards. So how did we make sense out of it?
Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at firstname.lastname@example.org.