A New Approach to Predictive Analytics Model Evaluation
Predictive analytics has become routine in a wide variety of disciplines. While models have become standard for many, I am not convinced that many analysts are appropriately evaluating the results of their efforts.
Many, including novice analysts, believe that with the availability of gains or decile analysis, the evaluation standards are obvious. Users overvalue the reasonableness of the gains table. Are more responders identified in the top decile segments, fewer in the middle, and a minimum amount on the bottom? While this is important, it does not always lead to selecting the "best" model for a given situation.
Consider the table below that reports on percent responders identified by decile. Looks good, doesn't it?
This may be referred at as a monotonic decreasing relationship. At each succeeding decile, fewer responders are found. This is precisely what a marketer wants. But there are additional elements that influence model evaluation that are frequently ignored. These include:
- Lift Variations
Lift Variations: The percentage of all responders identified at a specific depth plays a significant role in selecting which model to deploy. Let's look at the following chart:
The blue and red bars above represent results of two different models developed. Upon closer examination, we discovered a few things, visible in this table:
The original model identified over 27 percent at the 10 percent depth. That appeared to satisfy the marketer. It performed better than an alternative scenario. However, if we proceed to the fifth decile, the alternative option becomes the winner. It finds slightly over 80 percent. Which model should be deployed?
Choppiness: Let's look at the following results:
Chart 2 displays the gains results for two different models. Note the blue bars exhibit the monotonically decreasing relationship (no choppiness), while the red bar (choppiness) presents erratic or choppy behavior. Observe, for example, the change from decile 4 to decile 5 in the choppiness scenario. So although "choppiness" is evident in this scenario, we nevertheless identify more responders at the 50th percentile (79.58 vs. 77.31). Which model should our marketers adopt?