Data Must Flow, But Not All of Them
Three quarters of this planet’s surface is covered with water. Yet, human collectives have to work constantly to maintain a steady supply of fresh water. When one area is flooded, another region may be going through some serious drought. It is about distribution of resources, not about the sheer amount of them.
Data management is the same way. We are clearly living in the age of abundant data, but many decision-makers complain that there are not enough “useful” data or insights. Why is that?
Like any resource like water, data may be locked in wrong places or in inadequate forms. We hear about all kinds of doomsday scenarios related to the water supply in Africa, and it is because of uneven distribution of water thanks to drastic climate change and border disputes. Conversely, California is running out of its water sources, even as the state is sitting right next to a huge pond called the Pacific Ocean. Water, in that case, is in a wrong form for the end-users there.
Data must flow through organizations like water; and to be useful, they must be in consumable formats. I have been emphasizing the importance of the data refinement process throughout this series (refer to “Cheat Sheet: Is Your Database Marketing Ready?” and “It’s All about Ranking”). In the data business, too much emphasis has been put on data collection platforms and toolsets that enable user interface, but not enough on the middle part where data are aligned, cleaned and reformatted though analytics. Most of the trouble, unfortunately, happens due to inadequate data, not because of storage platforms and reporting tools.
This month, nonetheless, let’s talk about the distribution of data. It doesn’t matter how clean and organized the data sources are, if they are locked in silos. Ironically, that is how this term “360-degree customer view” became popular, as most datasets are indeed channel- or division-centric, not customer-centric.
It is not so difficult to get to that consensus in any meeting. Yeah sure, let’s put all the data together in one place. Then, if we just open the flood gates and lead all of the data to a central location, will all the data issues go away? Can we just call that new data pond a “marketing database”? (Refer to “Marketing and IT; Cats and Dogs.”)
The short answer is "No way, no sir." I have seen too many instances where IT and marketing try to move the river of data and fail miserably, thanks to the sheer size of such construction work. Maybe they should have thought about reducing the amount of data before constructing a monumental canal of data? Like in life, moving time is the best time to throw things away.
IT managers instinctively try to avoid any infrastructure failure, along with countless questions that would rise out of dumping “all” of the data on top of marketers’ laps. And for the sake of the users who can’t really plow through every bit of data anyway, we’ve got to be smarter about moving the data around.
The first thing that data players must consider is the purpose of the data project. Depending on the goal, the list of “must-haves” changes drastically.
So, let’s make an example out of the aforementioned “360-degree customer view” (or “single customer view”). What is the purpose of building such a thing? It is to stay relevant with the target customers. How do we go about doing that? Just collect anything and everything about them? If we are to “predict” their future behavior, or to estimate their propensities in order to pamper them through every channel that we get to use, one may think that we have to know absolutely everything about the customers.
The answer is “not really.” Let’s start with seemingly complicated transaction history data. People’s purchasing behavior — the building blocks of models that predict future behavior — can be shortened to the following list:
- Who bought,
- For how much ($),
- Through what channel.
- If readily available, we may consider peripheral information, such as payment method, types of stores, etc.
Of course I’m grossly simplifying the matter. But starting a conversation about data transfer this way is much easier than insisting on absolutely everything that may come with an 800-page data dictionary. Even the most guarded IT managers would say, “Yeah, sure. Why not? That list doesn’t look so bad.” The idea is to get all involved parties mobilized toward the end-goal faster, without arguing about the logistics of large data transfer. Instead, at the idea stage, get the buy-in from the stakeholders and start talking about long- and short-term data goals.
When we get to the actual project stage, we can break down the details. Continuing with the transaction data example:
- Who Bought: This part can be expressed as name, address, email, phone number, and other complete or incomplete IDs. Before insisting on having a perfect format for every field, think about the bare minimum for what you are trying to achieve. Not all marketing projects require pinpoint accuracy in the end.
- What: It can be a product description, SKU categories in multiple levels, etc. Some may need some serious re-categorization, but again, get ready to make do with what you get.
- When: This is the easiest part. Just get the time-stamp and the timezone. But, if the project is for any continuity business, get ready to get all types of dates, such as member date, subscription start and end date, renewal date, payment date, delinquent date, etc.
- How Much: This is unfortunately not that simple, as we may have to dig into differences in currencies, and details such as net price, tax, shipping, coupon, discount, return and total paid amount. Do not insist on getting every one of these fields correct. For predictive analytics, all of these numbers will be banded, anyway. Amount paid is the most important one, and discount amount can be very useful for the prediction of “bargain seekers.”
- Through What Channel: Being consistent is important, as I have seen so many different labels for channels like “online” or “retail.” Also, outbound channel and inbound channel must be treated separately.
- Other Fields? Payment method is very predictive, but don’t insist on non-essential items and delay the project.
The key here is the simplicity for everyone involved in data handoff. Start with the idea that no dataset is perfect, and analytics is about making the most out of provided data.
Even — seemingly huge — online behavior data can be simplified. All of those digital analytics toolsets are already working on streamlined data, anyway. Clicks, page views and conversions by various categories, such as product, channel, source or page types are the common sets of variables — no matter what toolsets are employed.
Do you want to get into some prediction business based on what is happening on the site? Let’s not try to boil the ocean. Let's simplify what we would insert into the process. We may just need:
- Who is looking at it (or a proxy of a person, like a cookie)
- What the visitor looked at
- What page or category the object belongs to
- How long the visitor looked at it
Without a doubt, one may think of more data items to use. But these are the very basic properties that make up what we call “behavior.” So, let’s keep it simple in the beginning.
As I have been stating in my articles for three years now, Big Data are big because nothing gets to be thrown out. There's lots of noise in any data, clean or dirty. The usefulness of the data is determined by the goals, not coolness. Depending on the purpose, the value of each nugget may change dramatically.
That is why the goal must be set first, before anyone tries to move the flow of large bodies of data. If the goal is clear and sound, the amount of data that has to change hands would look completely manageable. Even gargantuan data can be moved in small pieces through conduits the size of straws. We just have to know how to break them up before moving them for the users.
Stephen H. Yu is a world-class database marketer. He has a proven track record in comprehensive strategic planning and tactical execution, effectively bridging the gap between the marketing and technology world with a balanced view obtained from more than 30 years of experience in best practices of database marketing. Currently, Yu is president and chief consultant at Willow Data Strategy. Previously, he was the head of analytics and insights at eClerx, and VP, Data Strategy & Analytics at Infogroup. Prior to that, Yu was the founding CTO of I-Behavior Inc., which pioneered the use of SKU-level behavioral data. “As a long-time data player with plenty of battle experiences, I would like to share my thoughts and knowledge that I obtained from being a bridge person between the marketing world and the technology world. In the end, data and analytics are just tools for decision-makers; let’s think about what we should be (or shouldn’t be) doing with them first. And the tools must be wielded properly to meet the goals, so let me share some useful tricks in database design, data refinement process and analytics.” Reach him at firstname.lastname@example.org.