Data Quality: The Achilles’ Heel of Big Data
Big data — large volumes of data from multiple sources — can be a powerful tool for many agencies ranging from marketing to customer service to fraud detection to politics. Almost every activity in modern society generates large volumes of data, more and more of which are captured and stored. The cost of data storage and data processing continues to drop and software tools necessary to use the data continue to evolve.
By combining data from websites, call centers, email campaigns, and Facebook and Twitter, companies can get a comprehensive understanding of what their customers need and want as well as how best to service them. Unfortunately, not all who have tried to implement big data have succeeded. Indeed, the Achilles’ heel of big data is data quality. Good data doesn't happen by itself; data on its own is inherently messy. Quality data requires someone with data quality expertise as well the responsibility and authority to make it happen.
A perfect example of this is the experience I went through while working with a client. This Fortune 500 company spent years building a customer data warehouse. In the beginning the company used only web data, but later injected call-center data and customer descriptive data from multiple systems. As the warehouse grew bigger, more and more departments within the company started to consume its data, and the data gained greater visibility and serviceability.
Greater data visibility also exposed data-quality issues such as discrepancies in the total number of customers, customer value, use of the call center and customer geolocation depending on the views of the data. Even worse, there were inconsistencies between the customer data warehouse in question and other data sources within the company. As a result, different executives were given similar reports with differing figures, and suddenly data became political. That is, whose data were correct became a political issue.