Five Low-tech Data Mining Tips
In the data mining world, many would have you believe that refining the technical aspects of the analytic process is key for improving the performance of mining exercises and the insight gained from them. While there is some truth to this, the critical area for progress lies in the non-technical procedures that are so vital to a powerful outcome.
Here are five low-tech ways to improve your data mining exercises, with a view to preventing all-too-common errors that can keep marketers from optimizing customer relationships.
1. Take time to prepare your data.
We’ve all heard the “garbage in, garbage out” slogan that suggests the quality of the final data output depends on the quality of the data input. Indeed, experience shows that securing the appropriate data, the input, and preparing it for analysis consumes about two thirds of the total time necessary to complete a data mining exercise. It is time well invested.
Consider the anecdote of the analyst who discovered that 39 percent of his client’s customers owned nine vehicles. When searching for a logical explanation, a somewhat embarrassed systems analyst volunteered that the number “9” was used as a default to represent missing data. This same convention was used on much of the demographic data housed by the marketer and, of course, the analyst used much of the data as is.
A little more data preparation and a few questions in advance would have prevented this unfortunate scenario.
2. Select a valid sample.
Sampling is a key component of analysis. Using a valid sample provides reliable results. What constitutes a valid sample? Consider the following scenario: A communications firm with a customer base of 2.5 million selected a 10,000-name customer sample by choosing every hundredth record until it secured the desired quantity. In using this procedure, the firm had to count until it reached the 1 millionth record. Sounds pretty straightforward, doesn’t it? But what about the remaining 1.5 million customers? Is there a problem that none of these “bottom” 1.5 million names were selected? As it turns out, this firm sorts its files not alphabetically, but by how long ago a customer made his or her first purchase. Consequently, the result was a biased sample that included more tenured customers, and excluded “newer” customers. As you can see, selecting a valid sample makes a big difference.
3. Define the right objective.
A proper sampling, however, is of little use if the manager establishes incorrect marketing objectives. For example, using data mining, a cataloger found that a third of all its responders were captured in the top performing segment—the top 10 percent of the file. Ostensibly, these were superior results. However, additional investigation demonstrated that more than 30 percent of the merchandise purchased by this high-responding segment was returned to the retailer! This program attained its objective, but it had the wrong goal. The retailer should have defined the objective as maximizing net response, not gross.
4. Incorporate data that add insight.
It is the data that make or break a data mining effort. More data are beneficial, but only if the information contributes additional insight into the problem being tackled. By incorporating several metrics that point to the same underlying dimension—for example, files with six different measures of an individual’s age—some data mining algorithms may be “fooled” into producing less than optimal results. More data aren’t always better. However, this does not imply that considering additional data for analysis is incorrect. Always look for new sources of incremental data. Adding data that simply mimic what already are available is not the best use of scarce resources.
5. Build a data mining team.
Finally, there is the issue of wholly relying on data mining to arrive at marketing solutions. While these analytic approaches provide numeric results, marketers need to be able to draw insight and formulate conclusions. Don’t entirely depend on so-called fully automatic data mining solutions. No modeling algorithm or technology tool can substitute for human understanding and domain knowledge. Data mining must be a collaborative effort between industry professionals and experienced analysts.
No serious marketer can discount the value of data mining. Data are being captured. Methods are readily available to analyze them. Benefits are clear. By applying some additional effort to the non-technical aspects of analysis, mediocre results quickly can translate into superior ones.
Sam Koslowsky is vice president, modeling solutions for Harte-Hanks, a direct and targeted marketing company based in San Antonio. He can be reached at (212) 520-3259 or via e-mail at firstname.lastname@example.org.