4 Best Practices in Data Append
Investments in third-party data appends do return real dividends to many database marketers, who depend upon their power to ensure that the right people get the right message at the right time.
We see the value when we’re trying to pick high-potential customers out of a one-time buyer population, or when we want to implement a communication message that’s relevant to a specific life-event trigger. If you’re getting periodic appends, there are several best practices that you might consider to ensure that the append process is correct and consistent, and that the data’s as accurate as possible.
• Make sure the append process is performed in a consistent manner.
Data appends can use tight or loose matching logic and contain a wide array of match level options (individual, household, address, geocode). You should understand and document how the data vendor is doing your append and track the results over time. A large fraction of match reports contain a frequency distribution of a field called “match level,” or its equivalent. The values in this field tell you how many records matched at the individual, household and address levels during the matching process. In a geo append, the values might indicate whether census data was available at the block level, the block group, the tract or ZIP code levels. The distribution of this field should remain fairly consistent from overlay to overlay. If there are no changes on your end in the quality of address hygiene, then the share of each match level should stay consistent. If there is a dramatic change, then it’s time to talk to your data append vendor since the match might have to be rerun.
• Track the coverage and distribution of key attributes on the overlay file.
Regulatory and data source modifications can change how fields are created at your vendor. While vendors try to keep everything stable and static, they periodically change the way fields are compiled, and these differences will affect you—whether you’re notified of the changes by the vendor or not. Companies following best practices track the data elements regularly used for scoring, analysis, reporting and selection. They look at the percentage distribution of each tracked field as well as the amount of missing/unknown information in that field. Periodically, a tracking report will indicate large-scale differences. For example, true date of birth is available for only a certain percentage of individuals across the country. Some data vendors made a business decision to keep the DOB information if it is known, but to “fill in” the remaining records with a model they built to infer age. Before the change, the percentage of missing DOB was 40 percent, but after the implementation of the model, it became less than 1 percent. This means that any selection, model or report that breaks down results by age or depends on age for scoring will be very different, and this difference must be accounted for.