Take Your Time With Data Prep
By Tracy A. Gill
It may not be the most exciting part of the process, but data preparation is the foundation of an effective predictive modeling program.
As Hans Aigner, CEO of Germantown, Md.-based DataLabUSA, told attendees at his "Predictive Modeling: Breath new life into your direct marketing efforts" session at the Direct Marketing Association of Washington's annual conference in May, you should expect to spend about 70 percent of your project timetable on data preparation alone.
After you have done a thorough audit, and you understand all the demographic and transactional data attributes available, Aigner explained that you'll then need to work with the data to turn it into usable information. Some data transformations you may want to perform include:
- Address correction/standardization, e.g., do you have ZIP codes for all addresses? Are they plus-four coded? Are you abbreviating "street" or spelling it out? Are apartment numbers in their own field or part of the street address?
- Date attributes. Convert all of your data entries into a common format, such as month/day/year.
- Ratios. Calculate, for instance, the number of months inactive over the number of months as a customer.
- Averages. Determine averages such as lifetime purchases over the number of months as a customer.
- ZIP code analysis, e.g., converting ZIP codes into distance from retail store.