Not All Databases Are Created Equal
Further, even within multisource databases in the market, the coverage should be examined variable by variable, simply because some data points are really difficult to obtain even by professional data compilers. For example, any information that crosses between the business and the consumer world is sparsely populated in many cases, and the "occupation" variable remains mostly blank or unknown on the consumer side. Similarly, any data related to young children is difficult or even forbidden to collect, so a seemingly simple variable, such as "number of children," is left unknown for many households. Automobile data used to be abundant on a household level in the past, but a series of laws made sure that the access to such data is forbidden for many users. Again, don't be impressed with the existence of some variables in the data menu, but look into it to see "how much" is available.
In any scientific analysis, a "false positive" is a dangerous enemy. In fact, they are worse than not having the information at all. Many folks just assume that any data coming out a computer is accurate (as in, "Hey, the computer says so!"). But data are not completely free from human errors.
Sheer accuracy of information is hard to measure, especially when the data sources are unique and rare. And the errors can happen in any stage, from data collection to imputation. If there are other known sources, comparing data from multiple sources is one way to ensure accuracy. Watching out for fluctuations in distributions of important variables from update to update is another good practice.
Nonetheless, the overall quality of the data is not just up to the person or department who manages the database. Yes, in this business, the last person who touches the data is responsible for all the mistakes that were made to it up to that point. However, when the garbage goes in, the garbage comes out. So, when there are errors, everyone who touched the database at any point must share in the burden of guilt.
Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at email@example.com.