What's in Your Database?
By Hallie Mummert
Data mining initiatives are all the rage these days, especially modeling and clustering. These analytical processes can provide tremendous insight into which customers are most/least profitable and how to identify others like them; which products sell best; and which channels deliver a strong return on investment.
That is, they can if your database is clean enough, complete enough, and updated enough to give you reliable information. According to a white paper titled "Data Quality: A Problem and an Approach" by Javed Beg and Shadab Hussain of data warehousing firm Wipro Technologies, on average, 15 percent of the data in a U.S. customer database is incorrect, costing companies $600 billion a year in lost efficiency and missed revenue opportunities.
Before you can start slicing and dicing your data, you need to conduct a data audit. According to John Miglautsch, founder and chairman of Miglautsch Marketing, a Waukesha, Wis.-based database marketing
consultancy, there are two levels to a data audit:
1. Data counts, in which you review fields and input.
2. Data pattern analysis, where you perform some basic mining to, as Miglautsch puts it, "sluice" for data anomalies.
First Step: Data Counts
To prepare a customer file for analysis, a database marketing services provider first will run a report to identify data problems such as:
- empty fields,
- data in the wrong fields,
- extraneous information, and
- inconsistent data forms.
In addition, explains Nancy Kasmarski, vice president, client relations management at Donnelley Marketing, a provider of database marketing solutions and products in Woodcliff Lake, N.J., marketers want any of the proprietary data elements they capture reviewed for standardization and completeness.
These counts then are shared with the marketer to determine if what the service provider found in the database is what the client expected.
At this point, you also want to clean and update your postal addresses, says Eric Niebergall, director of data processing products, Donnelley Marketing. With this process you're not just making sure your addresses are deliverable, but keeping valuable customers from dropping off your radar due to recent moves.
Once you've performed these standard data reviews, you can determine which gaps in your data are important to fill so you can prioritize any enhancement investments.
For example, says Bernice Grossman, president of DMRS Group Inc., a New York City database development consulting firm, it's not uncommon for marketers to have telephone number fields that are incomplete or filled with nonsensical data such as a string of No. 9s. A telephone append easily can fix this data need.
Other types of data that might be appended include age, gender, income and presence of children on consumer housefiles. For B-to-B databases, a marketer typically would append SIC codes, employee size, sales volume, etc.
Finally, one of the key processes in prepping your database for mining is eliminating duplicates. Multichannel marketing has made tracking customer activity much harder these days, explains Kasmarski. If multiple departments capture customer information, they need to agree on a standard coding system so customers can be identified across channels.
Miglautsch cautions marketers to be careful in merging customer data from disparate sources. "Once you merge the wrong records together … you just got rid of the lifetime value for one good customer and gave it to [what could be] a first-time customer."
Grossman points out that not all data problems can be fixed—at least not without considerable expense. She advises marketers to flag records that aren't complete or correct, and to be careful when using them in future data mining projects.
The time frame for all this cleansing, de-duping and general prepping? Niebergall estimates that the first effort takes between five and 10 days for basic processing on files that run a few million names. Future maintenance that is conducted, say, once a quarter, can be automated and knocked out in anywhere from less than 24 hours to three days.
Second Step: Data Pattern Analysis
To find bigger data anomalies that don't show up in what Miglautsch refers to as the "spell-check" phase of a data audit, you actually have to run some simple data mining programs.
Miglautsch asks marketers for some RFM reporting that helps him score the file and start looking for data patterns that don't make sense. For example, he remembers a marketer whose sales in the South were attributed to its main, southern retail location, rather than individual customers. Initial analysis suggested that sales in the South came from just one ZIP code. Further investigation turned up the data input problem.
What's interesting and critical about this type of analysis is that you determine not only the mistakes that need to be caught and corrected, Miglautsch states, but identify golden opportunities.
A catalog marketer whose primary customer base was in the Northeast, Miglautsch explains, found that response to its mailings dropped significantly in the South. Instead of scaling back efforts in the South, the marketer examined the reasons behind this sales challenge. The solution was in its creative, which featured snowy landscapes that didn't connect with southern customers, explains Miglautsch.
Product classification also should be carefully analyzed. Sometimes, marketers have so many SKUs that they're hard pressed to draw any definitive conclusions from their order history, says Peter Vlahakos, who heads up Donnelley Marketing's analytical team. Marketers can roll up their SKUs into product categories, he explains.
For non-address type information, such as product codes, order history and self-reported information, analysis can take several weeks to several months, says Kasmarski. The time frame depends on the availability and knowledge of the marketer's IT staff, number of fields, quality of the data, and the documentation on data sources.
Third Step: Assess Your Data Readiness
Depending on the activity you're trying to predict or the trends you want to identify, you can do modeling or segmentation with less than complete data in all fields, says Vlahakos.
Ideally, he states, you would like all fields populated, but more often than not, you don't have all the information. Enhancement data can fill in some of those gaps. By focusing on the more populated fields, which tend to be the more predictive ones anyway, you will get a stronger result.
Vlahakos adds that it's important to ensure you've got a representative customer sample. "Typically, you want to get customers that are new to [your firm] in the last year or so," he states, explaining that you want to look at recent behavior and be sure you can append current data to current records.
Competition is driving marketers to make promotions more relevant to customers to be successful. Relevancy depends not only on what data you can capture, but on your ability to understand what it means and how your entire organization will leverage this knowledge in customer interactions.