Database: An Ensemble of Models
Data miners continually search for ways to improve the predictive accuracy of their marketing models. During the last few years, there have been improvements in the technology tools available that do just that. Additionally, new data sources periodically are introduced that further enhance the modeling result. Processes for employing data and tools have progressed, as well. One approach frequently overlooked by analysts involves developing a series of models, and then combining their outcomes to improve the overall prediction. Creating a set of models can be accomplished in a variety of ways, which I’ll explain later. But first, a little background.
When individuals make decisions, they may consider the views of several experts, evaluate what is being said, and then arrive at a conclusion. This is an accepted means for reaching a decision. Can you apply this seemingly natural process to improve your data mining results?
Yes, you can. However, not all such approaches provide enhanced outcomes, a concept we’ll explore in this article. Let’s start by noting that a set of models, referred to as an ensemble, is designed to average predictions from the resulting models. The theory is that several “experts” can arrive at a better solution than one specialist alone. Typically, uniting the ensemble members—the models in this case—is done in a supportive manner, where each of the ensemble members executes the same mission, and the predictions are joined to secure a superior result.
The process looks similar to Diagram 1. Here, three separate models are developed, scores are produced and these outputs then are combined. In theory, the combined score may be more accurate than the individual scores.
Selecting Data to Model
While several routines have been designed to use ensembles, I want to share with you two approaches that I have been assessing—with surprising results that affect accuracy.