What is a statistical model, and how is it built? In short, a model is a mathematical expression of "differences" between dichotomous groups. Too much of a mouthful? Just imagine two groups of people who do not overlap. They may be buyers vs. non-buyers; responders vs. non-responders; credit-worthy vs. not-credit-worthy; loyal customers vs. attrition-bound, etc. The first step in modeling is to define the target, and that is the most important step of all. If the target is hanging in the wrong place, you will be shooting at the wrong place, no matter how good your rifle is.
And the target should be expressed in mathematical terms, as computers can't read our minds, not just yet. Defining the target is a job in itself:
- If you're going after frequent flyers, how frequent is frequent enough for you? Five times a year or 10 times a year? Or somewhere in between? Or should it remain continuous?
- What if the target is too small or too large? What then?
- If you are looking for more valuable prospects, how would you express that? In terms of average spending, lifetime spending or sheer number of transactions?
- What if there is an inverse relationship between frequency and dollar spending (i.e., high spenders shopping infrequently)?
- And what would be the borderline number to be "valuable" in all this?
Once the target is set, after much pondering, then the job is to select the variables that describe the "differences" between the two groups. For example, I know how much marketers love to use income variables in various situations. But if that popular variable does not explain the differences between the two groups (target and non-target), the mathematics will mercilessly throw it out. This rigorous exercise of examining hundreds or even thousands of variables is one of the most critical steps, during which many variables go through various types of transformations. Statisticians have different preferences in terms of ideal numbers of variables in a model, while non-statisticians like us don't need to be too concerned, as long as the resultant model works. Who cares if a cat is white or black, as long as it catches mice?
Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at email@example.com.