Freeform Data Are Not Exactly Free
For "Title Rank," we may put all professional titles into the following categories:
- SVP, VP, and other "Chief" or "Officer" level titles under CEO
- Sr. Director and Director
- Manager, and other middle management titles
- Tactical titles, such as Account Manager, Programmer, Designer, Engineer, Writer, Editor, etc.
- Administrative Assistant, Secretary and other admin titles
- Rank-free independent titles for doctors, lawyers, consultants, etc.
- Etc., etc.
You may come up with other lists depending on your purpose. Similarly, we can set up "Title Function" for functional categories as the following:
- Research & Development
- Human Resources
- Etc., etc.
Again, depending on the purpose, we may expand or reduce this list. The key is to come up with ideal categories for specific purposes (in this case, for sales and marketing), that are not too general and not too specific. Such a Goldilocks zone lies at around 20 or fewer categories per variable. If you need more details, break the variable apart, like in this example.
Now, if you just sit down and go through 10,000 to 20,000 business titles and put them into these two categories, it won't be easy. But it won't be impossible, either. Granted that one person may categorize three to four titles in a minute, it can be estimated that one full-time person can go through 10,000 titles in six to seven working days or so, with some coffee breaks. If that person doesn't get suicidal doing it, the result would be quite rewarding and useful. Or, put eight interns on it and finish the task in one day, if you have that option.
The better and saner way is to set up a program that recognizes patterns of words, and let it assign the values to predetermined categories. There is an old saying in the programming field that "A lazy programmer turns out to be a good one," meaning that developers who hate manual work would create more automated modules and macros. Some editing and tweaking would be inevitable in an exercise like this, but looking for exceptions would be a lot easier than looking through the whole list. Plus, we might as well get used to that idea—as some freeform data may have a few billion variations in one field. In any case, auditing is very important, as words like "secretary" could be coming from "Secretary of the Treasury" or it could mean an administrative position. The same is true for "Manager, Accounting" vs. "Account Manager." As the order of operation becomes important, I recommend employing the "More Specific the Better" rule, where more specific strings of words are categorized first, and the general ones later (in this example, categorize the "Account Manager" first).
Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at firstname.lastname@example.org.