Chicken or the Egg? Data or Analytics?
Sure, it is not impossible to include all the instructions of variable conversion, reformat, edit and summarization in the model-scoring program. But such a practice is the No. 1 cause of errors, inconsistencies and serious delays. Yes, it is not impossible to steer a car with your knees while texting with your hands, but I wouldn't call that the best practice.
That is why marketing databases must be model-ready, where sampling and scoring become a routine with minimal data transformation. When I design a marketing database, I always put the analysts on top of the user list. Sure, non-statistical types will still be able to run queries and reports out of it, but those activities should be secondary as they are lower-level functions (i.e., simpler and easier) compared to being "model-ready."
Here is list of prerequisites of being model-ready (which will be explained in detail in my future columns):
- All tables linked or merged properly and consistently
- Data summarized to consistent levels such as individuals, households, email entries or products (depending on the ranking priority by the users)
- All numeric fields standardized, where missing data and zero values are separated
- All categorical data edited and categorized according to preset business rules
- Missing data imputed by standardized set of rules
- All external data variables appended properly
Basically, the whole database should be as pristine as the sample datasets that analysts play with. That way, sampling should take only a few seconds, and applying the resultant model algorithms to the whole base would simply be the computer's job, not some nerve-wrecking, nail-biting, all-night baby-sitting suspense for every update cycle.
In my co-op database days, we designed and implemented the core database with this model-ready philosophy, where all samples were presented to the analysts on silver platters, with absolutely no need for fixing the data any further. Analysts devoted their time to pondering target definitions and statistical methodologies. This way, each analyst was able to build about eight to 10 "custom" models—not cookie-cutter models—per "day," and all models were applied to the entire database with more than 200 million individuals at the end of each day (I hear that they are even more efficient these days). Now, for the folks who are accustomed to 30-day model implementation cycle (I've seen as long as 6-month cycles), this may sound like a total science fiction. And I am not even saying that all companies need to build and implement that many models every day, as that would hardly be a core business for them, anyway.
Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at email@example.com.