Contact us and see what NetOwl can do for you!
How Identity Resolution Helps Solve the Customer Data Management Problem
The Customer Data Management Problem
A common problem for a company is how to merge customer databases in an accurate and cost-effective manner. This is particularly daunting after a merger or acquisition (M&A), or simply when trying to consolidate internal customer databases from different divisions or offerings.
Merging data offers significant challenges, particularly in the case of merging customer data. It’s important to merge this data in an intelligent way, since it is frequently likely that there are significant overlaps in the set of customers. Duplicated customer records can easily result in wasted resources (e.g., duplicate direct marketing offers being sent to the same individual) and can also seriously distort the company’s understanding of its customer base. Unconsolidated customer records are also a problem when incomplete information results in missed opportunities or frustrated customers (e.g., in a call center scenario).
So how do we go about merging the customer data sets? When two different companies are involved, it’s highly likely there won’t be a common unique identifier that can be exploited. In addition, the representation of the data will likely vary quite a bit between the companies, so in order to merge the two sets of data, a common attribute or set of attributes needs to be identified. In some cases, the data sets may share a universal attribute such as a tax identification number, but that won’t be present for most consumer-oriented companies. The available attributes of customers are usually items like name, address, phone number, email address, etc. (Of course, names and addresses can also be broken out into their component parts: first name, middle initial, last name, or street address, city, state, zip code.)
In the absence of a reliable unique identifier, the best way to match customer records is to use attributes like name, phone number, address, etc., jointly to increase the probability of a match. “John Smithson” by itself doesn’t suffice, given that there are many people called “John Smithson” in the world, but getting a good match on the name, phone number, and address makes it highly likely that the match is correct.
But attributes like names and addresses have their own challenges. Unlike attributes like social security number, names and addresses are highly variable. For one interaction, a customer may give their name as “John Smithson.” In another, it may be “J. Smithson” or “John R. Smithson.” Their address may also vary: “6113 S. Norton St.” vs. “6113 South Norton Street.” Additionally misspelling and data entry errors are not unusual. A match process that relies completely on exact matches will not work here. There is a technology that can help with this problem, Identity Resolution. It uses sophisticated AI technology to decide if two database records refer to the same person or not.
How does Identity Resolution Help?
Identity Resolution handles the above variations and also many others, including, of course, simple typos, using very sophisticated fuzzy name matching technology. Some state-of-the-art Identity Resolution software uses machine learning techniques to achieve high accuracy.
Each attribute is first matched separately against its counterpart in the other data set and a score for the match is generated. Then records are compared according to business logic defined for each attribute. For example, one company may require a strong match in both the name and the phone number. Another company may consider matching of the name more important than that of the address. Identity Resolution allows for thresholds to be set: above a certain threshold, the two records can be considered as referring to the same customer; below a certain low threshold, the two records are considered as not a match. Those records matching between these two thresholds could be sent to a human reviewer for final determination.
In sum, Identity Resolution helps solve the daunting problem of merging customer data sets, dramatically accelerating what has traditionally been a time-consuming, labor-intensive, and expensive task. It does so using sophisticated AI techniques that offer high accuracy and throughput.