Catch Bad Data Before It Wrecks Your Business
Many distributed pieces of data about a single individual can be combined to form a deep profile about that individual. But how are different data records from disparate data sets combined to formulate insightful profiles?
These records are connected together through a process called "record linkage." This process searches through one or more data sets for records that refer to the same unique entity based on identifying characteristics that can be used to distinguish one entity from all others, such as names, addresses or telephone numbers. When two records are found to share the same pieces of identifying information, you might assume that those records can be linked together. It sounds simple in the scheme of a well-established data quality process, but there are still a number of challenges with linking records across more than one data set:
- The records from the different data sets don't necessarily share the same identifying attributes (one might have a phone number but the other one does not).
- The values in one data set use a different structure or format than the data in another data set (such as using hyphens for social security numbers in one data set but not in the other).
- The values in one data set are slightly different than the ones in the other data set (such as using nicknames instead of given names).
- One data set has the values broken out into separate data elements while the other does not (such as titles and name suffixes).
There are many variations on these themes. For example, merge/purge can be used for combining customer data sets following a corporate acquisition; enrichment can be used to institute a taxonomic hierarchy for customer classification and segmentation. Loosening the matching rules for merge/purge can help with a process called "householding," which attempts to identify individuals with some shared characteristics (such as "living in the same house").
Greg Brown is vice president of Melissa, provider of global contact data quality and identity verification solutions that span the entire data quality lifecycle and integrate into CRM, e-commerce, master data management and Big Data platforms. Connect with Greg at firstname.lastname@example.org or via LinkedIn.