Don’t Ruin Good Models by Abusing Them, Marketers
Modern-day 1:1 marketing is all about precision targeting, using all available data. And precision targeting is not possible with a few popular variables selected based on human intuition.
If human intuition takes over the process, every targeting logic would start with income and age. But let me ask you this: Do you really think that the differences between Mercedes and Lexus buyers are just income and age? If that is too tricky, how about the differences between travelers of luxury cruise lines and buyers of luxury cars? Would that be explained by income and age?
I’m sorry to break it to you bluntly, but all of those targets are rich. To come up with more effective targeting logic, you must dig deeper through data for other clues. And that’s where algorithmic solutions come into play.
I’ve worked with many smart people over the years, but I’ve never met a human who is capable of seeing through interactions among complex data variables without a computer. Some may understand two- or even three-dimensional interactions, when presented in a graphic format, but never more than that. Conversely, a simple regression model routinely incorporates 10 to 20 variables, and provides us with rank orders in forms of simple scores. Forget the next generation AI algorithms; humans have been solidly beaten by computers for decades when it comes to precision targeting.
So, when you have a dire need for more accurate targeting (i.e., you want to be mostly right, not mostly wrong); and have an ample amount of data (i.e., more data variables than you can easily handle); don’t even hesitate to go with statistical models. Resistance is simply futile. In the age of abundant data, we need models more than ever, as they convert mounds of data into digestible answers to questions. (For an extended list of benefits, refer to one of my early articles “Why Model?”)
But today, I am not writing this article to convince non-believers to become believers in statistical models. Quite frankly, I just don’t care if someone still is a non-believer in this day and age. It’s his loss, not mine. This not-so-short article is for existing users of models, who may have ruined them by abusing them from time to time.
As a data and analytics consultant, I get called in when campaign results are less than satisfactory; even when statistical models were actively employed in the target selection process. The most common expression I hear in such cases is, “The model didn’t work.” But when I dig through the whole process, I often find that the model algorithm is the only error-free item. How ironic.
I’ve talked about “analytics-readiness” so many times already. And, yes, inadequate sets of input data can definitely ruin models. So allow me to summarize ways users wreck perfectly adequate models “after” they were developed and validated. And there are many ways you can do that, unfortunately. Allow me to introduce a few major ones.
Using the Model in a Wrong Universe
Without a doubt, setting a wrong target will lead to an unusable model. Now, an equally important factor as the “target definition” is the “comparison universe.” If you are building a response model, for example, responders (i.e., targets) will be compared to non-responders (i.e., non-targets). If you are off in one of those, the whole model will be wrong — because a model is nothing but a mathematical expression of differences between the two dichotomous groups. This is why setting a proper comparison universe — generally, a sample out of the pool of names that you are using for the campaign — is equally as important as setting the right target.
Further, let’s say that you want to use models within preset universes, based on region, age, gender, income, past spending level, certain number of clicks, or any other segmentation rules. Such universe definitions — mostly about exclusion of obvious non-targets — should be determined “before” the model development phase. When such divisions are made, applying the model built for one universe (e.g., a regional model for the Mid-Atlantic) to another universe (e.g., the Pacific Northwest region) will not provide good results, other than with some dumb luck.
Ignoring the Design Principle of the Model
Like buildings or cars, models are built for specific purposes. If I may list a few examples:
- “Future customer value estimation, in dollars”
- “Propensity to purchase in response to discount offers via email”
- “Product affinity for a certain product category”
- “Loyalty vs. churn prediction”
- “Likelihood to be a bargain-seeker”
This list could be as long as what you want as a marketer.
However, things start to go wrong when the user starts ignoring (or forgetting) the original purpose of the model. Years back, my team built a model for a luxury cruise line for a very specific purpose. The brand was very reputable, so it had no trouble filling in staterooms with balconies at a higher price point. But it did have some challenges filling in inside staterooms at a relatively high price of entry, which was equivalent to a window room on a less fancy ship. So, the goal was to find cruisers who would take up inside staterooms, for the brand value, on Europe-bound ships that depart U.S. ports between Thanksgiving and Christmas. A very specific target? You bet.
Troubles arose because it worked all too well for the cruise line. So, without any further consultation with any analysts, they started using that model for other purposes. We got phone calls only after the attempt failed miserably. Now, is that really the fault of the model? Sure, you can heat up your house with a kitchen oven, but don’t blame the manufacturer when it breaks down by abusing it like that. I really don’t think the warranty applies there.
Playing With Selection Rules
Some marketers are compelled to add more rules after the fact, probably out of sheer enthusiasm for success. For instance, a person in charge of a campaign may come up with an idea at the last minute, and add a few rules on top of the model selection, as in “Let’s send mails only to male prospects in the high-score group.” What this means is that he just added the strongest variable on top of a good model, which may include 15 to 20 variables, all carefully weighted by a seasoned statistician. This type of practice may not lead to a total disaster, but the effectiveness of the model in question is definitely diluted by the post-selection rules.
When the bad results start to come in, again, don’t blame the modeler for it. Because “you” essentially redesigned the model by adding new variables on top of existing predictors. Unfortunately, this type of last-minute meddling is quite common. If you have a good reason to do any “post-selection,” please talk to the analyst before the model is built, so that she can incorporate the rule as a “pre-selection” logic. She may give you multiple models fitted for multiple universes, too.
Realigning the Model Groups in Arbitrary Ways
Model scores are just long numbers — with eight or nine decimal places, in general. It is hard to use sheer numeric values like that, so kind modelers generally break the scored universe into 10 or 20 equal groups. (We call them decile or demi-decile groups.)
For instance, each decile group would represent 10% of the development and validation samples. When applied to the campaign universe, resultant score groups should not deviate too much from that 10% mark.
If you see big bumps in model group sizes, it is a clear sign that something went wrong in scoring, there were significant changes in the input variables, or the model is losing its effectiveness, over time.
I’ve seen cases where users just realigned the model score groups after the fact, simply because groups were not showing an equal 10% break anymore. That is like covering serious wounds with a make-up. Did the model work after that? Please take a wild guess.
Using Expired Models
Models do have limited shelf-lives. Models lose their predictive power over time, as market conditions, business models, data sources, data procurement methods, and target profiles all inevitably go through changes.
If you detect signs of lagging results or wide fluctuations in model group distribution (i.e., showing only 3% in the top decile, which is supposed to be around 10%), it is time to review the model. In mild cases, modelers may be able to refit the model. But in this day and age of fast computers and automation, I recommend full redevelopment of the model in question at the first sign of trouble.
Ignoring the ‘Level’ of Prediction
A model for target marketing is to rank the records from high to low scores, according to the design principle. If you built an affinity model for “Likely to be an early adopter,” high score means the target is more likely to be an early adopter, and low score means she’s less likely to be one. Now, the level of the record matters here. What are you really ranking, anyway?
The most common ones are individual and household levels. It is possible to build a model on an email level, as one individual may have multiple email addresses. If you are in a telecom business, you may not even care for the household-level identity, as the “house” may be the target, regardless of who lives in there.
In the application stage, matching the “level” of prediction is important. For household models, it is safe to assume that almost all predictors in the model are on a household level. Applying such models on a different level may negatively affect the model performance. A definite “no” is using household-level score for an address, not knowing who lives there. One may think “How different will the new mover be from the old resident?” But considering a wide variety of demographic variables commonly used in models, it is something that no modeler would recommend. If the model employed any transaction or behavioral data, don’t even think about switching levels like that. You’d be better off building a regional model (such as ZIP model) only using geo-demographic data.
Applying Average Scores to Non-Matches or Non-Scorable Records
Sometimes, scores are missing because of non-matches in the data append process, or strict universe definition using pre-selection rules. It can be tempting to apply some “average” score to cover the missing ones, but that is a big no-no, as well. Statisticians may perform such imputation on a variable level to fill missing values, but not with model scores.
If you really have to have a score for every record, build separate models for non-match or non-select universes, using any available data (if there are any to be used). In CRM models, no one should just drop non-matches into demographic files, as the main drivers of such models would be transaction and behavioral data. Let missing values play out in the model (refer to “Missing Data Can Be Meaningful”).
For prospecting, once you set up a pre-selection universe (hopefully, after some profile analysis), don’t look back and just go with a “scored” universe. Records with missing scores are generally not salvageable, in practice.
Go Forth and Do Good Business With Models, Marketers
As you can see, there are many ways to mess up a good model. A model is not an extension of rudimentary selection rules, so please do NOT treat it that way. Basically, do not put diesel fuel in a gasoline car, and hope to God that the engine will run smoothly. And when — not if — the engine stalls, don’t blame the engineer.
Models may be built, but the work is not nearly done until they are properly applied and deployed in live campaigns. When in doubt, always consult with the analyst in charge; hopefully, before the drop date.
Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at firstname.lastname@example.org.