Database : Close the Loop Properly
Three major analytical gaps that lead to faulty data decisions
January 2008 By Stephen Yu
We’ve all seen it: that circular diagram with all the steps direct marketers live through, starting with campaign planning and list acquisition, passing through inevitable data processing steps, and concluding with selection and source coding steps. Then there is a little arrow that connects the last step back to the beginning, capturing the responses we all hope to get out of campaigns. Anyone exposed to one-to-one marketing must have seen at least a few versions of “closed-loop marketing” diagrams. The diagram could take a different form depending on the presenter’s agenda, but the core of the idea is represented in a box called “response analysis.”
That should not be news to any marketer; after all, the foundation of direct marketing lies in the concept of measurement. Lester Wunderman, known as the “father of direct marketing,” did not invent the one-to-one delivery medium, but he first attempted to systemize the result analyses and key measurement metrics. Unfortunately, not everyone in the direct marketing industry today measures their business via scientific and methodical approaches all the time.
Matchback: Do It Carefully
It would be ideal if all marketers had the luxury of properly designed marketing databases with a built-in “closed-loop” mechanism; however, the reality is many still rely on anecdotal information on a campaign level. The trouble with that approach is marketers cannot examine the results based on detailed data points such as name source, selection logic and creative versions tagged in the individual/household-level source codes. The match process becomes essential since the collection rate of source codes at the time of transaction can be unacceptably low when dealing with newer response channels such as the Internet.
A simple matchback process can provide all the necessary result details, granted that the master mail file was maintained properly and the response data contained usable name and address information. It’s hard to believe this step is overlooked by many mailers due to extra costs and resource requirements. Nonetheless, once committed to the response matchback, it is important to pay attention to the details of the process:
• Campaign and time window. Allow a consistent time window for responses if multiple campaigns are analyzed.
• Match logic and rules. Soft-match logic using names and addresses must be tweaked in advance to prevent over- and under-matching situations that can affect the validity of subsequent result studies.
• Data reconciliation. Collected source code information often does not agree with the master mail file. Information on the master file typically is trusted over manually collected figures.
• Date window. If there are multiple campaigns going on in close proximity in time, rules must be in place to credit proper campaigns for responses. Generally, the latest campaign gets the credit, as long as the response date is not too close to the mail-drop date.
• Allocation rules for unmatched responses. Even with the most sophisticated match programs, there always will be unmatched records. Provided this is a small part of the response universe, business rules must be set to handle the unmatched response records.
• Key report variables. To avoid any redundant data processing, key analytical variables must be defined and maintained throughout the process.
The main issue with skipping the matchback process is that the analysts solely rely on manually collected data on the response side, which may not agree with the master file or be completely missing. In addition, it is important to recognize that analysts cannot measure anecdotal responses to a fraction of a percent. Without the matchback, it may be necessary to surrender certain levels of details, such as segment or name source, due to lack of coverage, which often are the focal points of all direct mail response analyses.
The Flaw in Random Merge/Purge
Most of the matchback process is done on the master file, which is suitable for testing creative packages, offers and delivery channels. However, because merge/purge output keeps only one record per household/individual regardless of its origins, the master file has a serious flaw when it comes to list source evaluation. Yet, list level measurement is one of the key metrics in ROI studies, as the list cost is the one that varies the most. Imagine a situation where three list providers sent a responsive name, and only one lucky winner who survived the so-called “random” merge/purge gets all the credit in every subsequent study.
One may argue that the random allocation mechanism built into merge/purge is totally fair since the rule applies to all list providers. That is simply not the case unless the order sizes are about the same among all list providers. Assume a marketer orders names from only two sources. The marketer orders 1 million names from Vendor A, thanks to a long and successful relationship. The same marketer orders only 10,000 names from Vendor B, as it is relatively new to the industry. There were 5,000 duplicates between the two files. After the “random” allocation, each list will have lost 2,500 names. That loss for Vendor A is only 0.25 percent of the input, but it is a 25 percent loss for Vendor B, enough to influence the validity of response study. Now, this is when we assume random allocation works properly (often it doesn’t), and that all lists receive the same merge/purge priority (well, that never happens). In reality, where many more lists bump into each other with a far greater number of interfile duplicates, small test files practically are destroyed in terms of statistical validity before the study even begins.
This dilemma easily can be solved by using the “input” file to the merge/purge in the matchback process, crediting all name providers for known responders. Instead of matching the responder file to the master file (typical scenario), the merge/purge “input” file—with all interfile duplicates in tact—should be matched to the responder file, which acts as the base file. Through this “reverse” match process, marketers can study response rates list by list, as if each was the only one in the mailing.
Today, most mailers do not employ such a method, simply because larger file sizes on the input side lead to higher processing costs. Many believe the “random” merge/purge will resolve the allocation issue, although it may be seriously flawed when files sizes vary among list providers. With all other factors remaining constant, random allocation tends to favor larger lists. For smaller files, it would be like deciding the outcome of a baseball game after the fourth inning. Such inadequate response study practice should not be justified for the sake of cost savings. The question must be asked: Why are we still using an antiquated “per 1,000” pricing scale for vital functions like matchback, when processing and storage costs are fractions of what they used to be?
Response Rates Are Not Baseball Scores
Many marketers looks at results in multidimensional ways to understand the winning combinations of name sources, selection criteria, creative packages, offers and channels. However, when an analyst breaks down the responders into smaller groups using all possible combinations of study elements, the segment size may become too small to yield any meaningful statistics. There may be less than 10 responders with income over $75,000 who received letter version No. 2 plus a free shipping offer in a segment called List A. To avoid situations like this, it is more prudent to examine each measurement criterion separately.
Even when only one element is studied at a time, marketers must be aware of the statistical validity issue when comparing multiple segments. For example, there is little difference between response rates of 1.15 percent and 1.23 percent, unless over 100,000 pieces were mailed in each group. Too often marketers jump to conclusions and treat response rates as the ultimate ranking tool. To be fair, one must be aware of the sample size, confidence level and size of differences to be measured. Without statistical training, you must be careful not to draw conclusions too hastily. After all, those 5,000 merge/purge survivors in segments A and B may not be big enough to tell you any story about less than half a percent difference in response rate. When in doubt, please consult a statistician, or at least download some utility programs off the Internet and plug in the numbers before cleaning up your vendor list.
Closed-loop marketing is one of the most overused terms in marketing, and yet many marketers do not close the loops properly. Remember that imbedding key codes in your mail pieces is just the beginning. Properly analyzing the results and applying the knowledge to the next mailing will complete the circle.
Stephen Yu is vice president of database marketing at infoUSA National Accounts Division, a direct marketing solutions firm in Woodcliff Lakes, N.J. He can be reached at (201) 476-2305.
That should not be news to any marketer; after all, the foundation of direct marketing lies in the concept of measurement. Lester Wunderman, known as the “father of direct marketing,” did not invent the one-to-one delivery medium, but he first attempted to systemize the result analyses and key measurement metrics. Unfortunately, not everyone in the direct marketing industry today measures their business via scientific and methodical approaches all the time.
Matchback: Do It Carefully
It would be ideal if all marketers had the luxury of properly designed marketing databases with a built-in “closed-loop” mechanism; however, the reality is many still rely on anecdotal information on a campaign level. The trouble with that approach is marketers cannot examine the results based on detailed data points such as name source, selection logic and creative versions tagged in the individual/household-level source codes. The match process becomes essential since the collection rate of source codes at the time of transaction can be unacceptably low when dealing with newer response channels such as the Internet.
A simple matchback process can provide all the necessary result details, granted that the master mail file was maintained properly and the response data contained usable name and address information. It’s hard to believe this step is overlooked by many mailers due to extra costs and resource requirements. Nonetheless, once committed to the response matchback, it is important to pay attention to the details of the process:
• Campaign and time window. Allow a consistent time window for responses if multiple campaigns are analyzed.
• Match logic and rules. Soft-match logic using names and addresses must be tweaked in advance to prevent over- and under-matching situations that can affect the validity of subsequent result studies.
• Data reconciliation. Collected source code information often does not agree with the master mail file. Information on the master file typically is trusted over manually collected figures.
• Date window. If there are multiple campaigns going on in close proximity in time, rules must be in place to credit proper campaigns for responses. Generally, the latest campaign gets the credit, as long as the response date is not too close to the mail-drop date.
• Allocation rules for unmatched responses. Even with the most sophisticated match programs, there always will be unmatched records. Provided this is a small part of the response universe, business rules must be set to handle the unmatched response records.
• Key report variables. To avoid any redundant data processing, key analytical variables must be defined and maintained throughout the process.
The main issue with skipping the matchback process is that the analysts solely rely on manually collected data on the response side, which may not agree with the master file or be completely missing. In addition, it is important to recognize that analysts cannot measure anecdotal responses to a fraction of a percent. Without the matchback, it may be necessary to surrender certain levels of details, such as segment or name source, due to lack of coverage, which often are the focal points of all direct mail response analyses.
The Flaw in Random Merge/Purge
Most of the matchback process is done on the master file, which is suitable for testing creative packages, offers and delivery channels. However, because merge/purge output keeps only one record per household/individual regardless of its origins, the master file has a serious flaw when it comes to list source evaluation. Yet, list level measurement is one of the key metrics in ROI studies, as the list cost is the one that varies the most. Imagine a situation where three list providers sent a responsive name, and only one lucky winner who survived the so-called “random” merge/purge gets all the credit in every subsequent study.
One may argue that the random allocation mechanism built into merge/purge is totally fair since the rule applies to all list providers. That is simply not the case unless the order sizes are about the same among all list providers. Assume a marketer orders names from only two sources. The marketer orders 1 million names from Vendor A, thanks to a long and successful relationship. The same marketer orders only 10,000 names from Vendor B, as it is relatively new to the industry. There were 5,000 duplicates between the two files. After the “random” allocation, each list will have lost 2,500 names. That loss for Vendor A is only 0.25 percent of the input, but it is a 25 percent loss for Vendor B, enough to influence the validity of response study. Now, this is when we assume random allocation works properly (often it doesn’t), and that all lists receive the same merge/purge priority (well, that never happens). In reality, where many more lists bump into each other with a far greater number of interfile duplicates, small test files practically are destroyed in terms of statistical validity before the study even begins.
This dilemma easily can be solved by using the “input” file to the merge/purge in the matchback process, crediting all name providers for known responders. Instead of matching the responder file to the master file (typical scenario), the merge/purge “input” file—with all interfile duplicates in tact—should be matched to the responder file, which acts as the base file. Through this “reverse” match process, marketers can study response rates list by list, as if each was the only one in the mailing.
Today, most mailers do not employ such a method, simply because larger file sizes on the input side lead to higher processing costs. Many believe the “random” merge/purge will resolve the allocation issue, although it may be seriously flawed when files sizes vary among list providers. With all other factors remaining constant, random allocation tends to favor larger lists. For smaller files, it would be like deciding the outcome of a baseball game after the fourth inning. Such inadequate response study practice should not be justified for the sake of cost savings. The question must be asked: Why are we still using an antiquated “per 1,000” pricing scale for vital functions like matchback, when processing and storage costs are fractions of what they used to be?
Response Rates Are Not Baseball Scores
Many marketers looks at results in multidimensional ways to understand the winning combinations of name sources, selection criteria, creative packages, offers and channels. However, when an analyst breaks down the responders into smaller groups using all possible combinations of study elements, the segment size may become too small to yield any meaningful statistics. There may be less than 10 responders with income over $75,000 who received letter version No. 2 plus a free shipping offer in a segment called List A. To avoid situations like this, it is more prudent to examine each measurement criterion separately.
Even when only one element is studied at a time, marketers must be aware of the statistical validity issue when comparing multiple segments. For example, there is little difference between response rates of 1.15 percent and 1.23 percent, unless over 100,000 pieces were mailed in each group. Too often marketers jump to conclusions and treat response rates as the ultimate ranking tool. To be fair, one must be aware of the sample size, confidence level and size of differences to be measured. Without statistical training, you must be careful not to draw conclusions too hastily. After all, those 5,000 merge/purge survivors in segments A and B may not be big enough to tell you any story about less than half a percent difference in response rate. When in doubt, please consult a statistician, or at least download some utility programs off the Internet and plug in the numbers before cleaning up your vendor list.
Closed-loop marketing is one of the most overused terms in marketing, and yet many marketers do not close the loops properly. Remember that imbedding key codes in your mail pieces is just the beginning. Properly analyzing the results and applying the knowledge to the next mailing will complete the circle.
Stephen Yu is vice president of database marketing at infoUSA National Accounts Division, a direct marketing solutions firm in Woodcliff Lakes, N.J. He can be reached at (201) 476-2305.




Social Media ROI
Email Marketing that Works (2nd Edition)