Contact us and see what NetOwl can do for you!
How to Choose a Fuzzy Name Matching Product
Homeland Security, Name Matching, Record Management, Risk Management
Fuzzy Name Matching is Critical for Many Applications across Industries
Fuzzy Name Matching is a technology that matches names against a potentially large number of variants due to causes like spelling errors, nicknames, transliteration differences, and so on. There are many critical applications across different industry sectors that call for accurate fuzzy name matching. For example:
-
- Homeland Security
- Border security
- Law enforcement
- Visa application screening
- Financial
- Anti-money laundering (AML)
- Politically Exposed Persons (PEP)
- Know Your Customer (KYC)
- Healthcare
- Patient record matching
- Fraud detection
- Retail
- Customer Data Management
- Customer Stitching
- Background Screening
- Employee background checks
- Homeland Security
If you are looking for a fuzzy name matching product for your particular application, you want to read on in order to be able to select the most suitable one objectively.
Criteria for Choosing a Fuzzy Name Matching Product
Matching names against a database of names, which is a typical use case, involves meeting several challenges. As we discussed in a previous blog, there are many different factors that can cause variations in name spellings, from simple misspellings to complex effects arising from names that are originally written in foreign languages. In the following we provide a guide to what you should look for in a fuzzy name matching product.
1. Can it handle the wide variety of name variant phenomena?
For example, these are some of the common reasons why names are written differently:
- Misspelling: John Richards – John Richarda
- Names with the same sound: Kay – Kaye; Allen – Allan – Alan
- Nicknames: Robert – Bob – Bobby; Theodore – Ted
- Initials: John Ronald Smith – J. R. Smith
- Name order variants: Fumio Kishida – Kishida Fumio
- Missing name elements: John Frank Robertson – John Robertson
- Company abbreviations: Bayerische Motoren Werke AG – BMW; Smith, Jones, & Company LLP – SJC
2. Can it handle the wide variety of name ethnicity specific phenomena from around the world?
For example:
-
- A single Arabic name can have different segmentations in English. These names are all the same in Arabic, but they can vary when brought into English:
- Abd al-Rahman vs. Abdul Rahman vs. Abdarrahman
- Transliteration Standard differences, as in Chinese. The first transliteration below uses the Pinyin standard while the second one uses Wade-Giles, These are the two major ways of transliterating Chinese characters into Latin script. As you can see, they’re not very similar.
- Xi Jinping vs. Hsi Chin-p’ing.
- Spanish last names with both matronymics and patronymics can be shortened by dropping the maternal last name:
- Carlos Guzman Ramos vs. Carlos Guzman
- A single Arabic name can have different segmentations in English. These names are all the same in Arabic, but they can vary when brought into English:
3. Can it handle fuzzy name matching of entity types you need?
Does your application need fuzzy name matching of just person names or more? For example, additional entity types you want to match may include:
-
- Organization
- Place
- Address
- Vehicle
- Email Address
- Phone number
- Date
You want to check if a fuzzy name matching tool handles the entity types that you need.
4. Can it handle fuzzy name matching of records with multiple fields?
Does your use case require that you match just names or do you need to match records with additional fields? For example, you may want to match not only a person’s name but also the person’s date of birth, place of birth, nationality, spouse, address, etc. If so, you want a product that takes matching results of all the relevant fields into consideration in an intelligent way.
5. Can it provide a matching score?
In many use cases, you will want to have a flexibility to set cut-off thresholds for matches. In some cases, the fuzzy name matching needs to be highly accurate, i.e., there should be few if any false positives. So you want to choose a higher matching score as a threshold. In others, the matching can be less stringent because you want to see more potential matches so as not to miss a match. In this case, you want to set the matching score lower.
6. How accurate is the fuzzy name matching?
Does it provide both low rates of false positives and false negatives? Sometimes a fuzzy name matching tool will say two names are variants of each other when they are not (false positive) and, conversely, it does not find any match even if there is one (false negative). (Note: False negatives are harder to assess unless you have an answer key, i.e., for each name you want to match, you know all the possible variants in the name database searched against.)
High accuracy is critical because a high false positive rate would be overwhelming and cost time and money while a high false negative rate would lead to missed matches that could result in dire consequences.
7. How fast and scalable is the fuzzy name matching?
In many use cases, such as with matching air travelers against a terrorist watch list, the fuzzy name matching has to be real-time and scalable at peak travel times. In other cases, such as that of a marketing firm working to update a customer database, the matching can happen at a slower pace. Depending on your use case, you need to see if the fuzzy name matching product addresses your speed and scalability requirements.
8. Is it customizable?
Your application domain may require some specific customizations beyond what the out-of-the-box fuzzy name matching product offers. For example:
-
- You may need to handle unconventional name aliases, nicknames, or abbreviations.
- You want to specify certain field values that should be ignored for matching purposes.
- You want to change the weights of fields to alter matching behaviors.
If so, you need a fuzzy name matching tool that allows you to customize in the ways that you require.
9. Does it support the languages you need?
In some use cases, there’s a requirement to match names in foreign scripts (Arabic, Chinese, Cyrillic, etc.) against names in Latin/English script, or vice versa. For example, a bank in the Middle East might have a database with names in Arabic script that have to be matched against such watch lists as OFAC in English.
Summary
Selecting a fuzzy name matching product requires consideration of a number of factors as indicated above. You need to weigh them carefully and make the necessary trade-offs to get the functionality you desire.