Not All Databases Are Created Equal
Numeric errors are much harder to catch, as numbers do not look wrong to human eyes. That is when comparison to other known sources becomes important. If such examination is not possible on a granular level, then median value and distribution curves should be checked against historical transaction data or known public data sources, such as U.S. Census Data in the case of demographic information.
When it's about the companies' own data, follow your instincts and get rid of data that look too good or too bad to be true. We all can afford to lose a few records in our databases, and there is nothing wrong with deleting the "outliers" with extreme values. Erroneous names, like "No Information," may be attached to a seven-figure lifetime spending sum, and you know that can't be right.
The main takeaways are: (1) Never trust the data just because someone bothered to store them in computers, and (2) Constantly look for bad data in reports and listings, at times using old-fashioned eye-balling methods. Computers do not know what is "bad," until we specifically tell them what bad data are. So, don't give up, and keep at it. And if it's about someone else's data, insist on data tallies and data hygiene stats.
Outdated data are really bad for prediction or analysis, and that is a different kind of badness. Many call it a "Data Atrophy" issue, as no matter how fresh and accurate a data point may be today, it will surely deteriorate over time. Yes, data have a finite shelf-life, too. Let's say that you obtained a piece of information called "Golf Interest" on an individual level. That information could be coming from a survey conducted a long time ago, or some golf equipment purchase data from a while ago. In any case, someone who is attached to that flag may have stopped shopping for new golf equipment, as he doesn't play much anymore. Without a proper database update and a constant feed of fresh data, irrelevant data will continue to drive our decisions.
Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at email@example.com.