Not All Databases Are Created Equal
Recently, I was part of a project that involved data collected from retail stores. We ran all kinds of reports and tallies to check the data, and edited many data values out when we encountered obvious errors. The funniest one that I saw was the first name "Asian" and the last name "Tourist." As an openly Asian-American person, I was semi-glad that they didn't put in "Oriental Tourist" (though I still can't figure out who decided that word is for objects, but not people). We also found names like "No info" or "Not given." Heck, I saw in the news that this refugee from Afghanistan (he was a translator for the U.S. troops) obtained a new first name as he was granted an entry visa, "Fnu." That would be short for "First Name Unknown" as the first name in his new passport. Welcome to America, Fnu. Compared to that, "Andolini" becoming "Corleone" on Ellis Island is almost cute.
Data entry errors are everywhere. When I used to deal with data files from banks, I found that many last names were "Ira." Well, it turned out that it wasn't really the customers' last names, but they all happened to have opened "IRA" accounts. Similarly, movie phone numbers like 777-555-1234 are very common. And fictitious names, such as "Mickey Mouse," or profanities that are not fit to print are abundant, as well. At least fake email addresses can be tested and eliminated easily, and erroneous addresses can be corrected by time-tested routines, too. So, yes, maintaining a clean database is not so easy when people freely enter whatever they feel like. But it is not an impossible task, either.
We can also train employees regarding data entry principles, to a certain degree. (As in, "Do not enter your own email address," "Do not use bad words," etc.). But what about user-generated data? Search and kill is the only way to do it, and the job would never end. And the meta-table for fictitious names would grow longer and longer. Maybe we should just add "Thor" and "Sponge Bob" to that Mickey Mouse list, while we're at it. Yet, dealing with this type of "text" data is the easy part. If the database manager in charge is not lazy, and if there is a bit of a budget allowed for data hygiene routines, one can avoid sending emails to "Dear Asian Tourist."
Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at email@example.com.