Database: An Ensemble of Models
Combine the results of several models to improve the overall prediction
January 2007 By Sam KoslowskyWhen individuals make decisions, they may consider the views of several experts, evaluate what is being said, and then arrive at a conclusion. This is an accepted means for reaching a decision. Can you apply this seemingly natural process to improve your data mining results?
Yes, you can. However, not all such approaches provide enhanced outcomes, a concept we’ll explore in this article. Let’s start by noting that a set of models, referred to as an ensemble, is designed to average predictions from the resulting models. The theory is that several “experts” can arrive at a better solution than one specialist alone. Typically, uniting the ensemble members—the models in this case—is done in a supportive manner, where each of the ensemble members executes the same mission, and the predictions are joined to secure a superior result.
The process looks similar to Diagram 1. Here, three separate models are developed, scores are produced and these outputs then are combined. In theory, the combined score may be more accurate than the individual scores.
Selecting Data to Model
While several routines have been designed to use ensembles, I want to share with you two approaches that I have been assessing—with surprising results that affect accuracy.
The first method, very straightforward, involves dividing a data set into a series of smaller files, developing models off these reduced files and then combining results. A data file of 100,000 observations, for example, is divided into four smaller files of 25,000, and four individual models are created. The results of the four models are averaged to arrive at a more accurate result.
The second technique, which is a bit more complicated, creates what is referred to as “bagged” samples. Bagging, short for “bootstrap aggregation,” was first presented in 1996 by famed statistician Leo Breiman as a means for reducing the prediction error of modeling tools. The term bagging refers to creating additional samples from the original data.




Social Media ROI
Email Marketing that Works (2nd Edition)