Big Data Must Get Smaller
Other types of descriptive data include geo-demographic data, and the Census Data by the U.S. Census Bureau falls under this category. These datasets are organized by geographic denominations such as Census Block Group, Census Tract, Country or ZIP Code Tabulation Area (ZCTA, much like postal ZIP codes, but not exactly the same). Although they are not available on an individual or a household level, the Census data are very useful in predictive modeling, as every target record can be enhanced with it, even when name and address are not available, and data themselves are very stable. The downside is that while the datasets are free through Census Bureau, the raw datasets contain more than 40,000 variables. Plus, due to the budget cut and changes in survey methods during the past decade, the sample size (yes, they sample) decreased significantly, rendering some variables useless at lower geographic denominations, such as Census Block Group. There are professional data companies that narrowed down the list of variables to manageable sizes (300 to 400 variables) and filled in the missing values. Because they are geo-level data, variables are in the forms of percentages, averages or median values of elements, such as gender, race, age, language, occupation, education level, real estate value, etc. (as in, percent male, percent Asian, percent white-collar professionals, average income, median school years, median rent, etc.).
There are many instances where marketers cannot pinpoint the identity of a person due to privacy issues or challenges in data collection, and the Census Data play a role of effective substitute for individual- or household-level demographic data. In predictive analytics, duller variables that are available nearly all the time are often more valuable than precise information with limited availability.
Transaction Data/Behavioral Data
While descriptive data are about what the targets look like, behavioral data are about what they actually did. Often, behavioral data are in forms of transactions. So many just call it transaction data. What marketers commonly refer to as RFM (Recency, Frequency and Monetary) data fall under this category. In terms of predicting power, they are truly at the top of the food chain. Yes, we can build models to guess who potential golfers are with demographic data, such as age, gender, income, occupation, housing value and other neighborhood-level information, but if you get to "know" that someone is a buyer of a box of golf balls every six weeks or so, why guess? Further, models built with transaction data can even predict the nature of future purchases, in terms of monetary value and frequency intervals. Unfortunately, many who have access to RFM data are using them only in rudimentary filtering, as in "select everyone who spends more than $200 in a gift category during the past 12 months," or something like that. But we can do so much more with rich transaction data in every stage of the marketing life cycle for prospecting, cultivating, retaining and winning back.
Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at firstname.lastname@example.org.