Contact us and see what NetOwl can do for you!
Identity Resolution Facilitates the Creation of Electronic Health Records
Electronic Health Records Promote Better Health Outcomes
It’s pretty clear that universal Electronic Health Records (EHRs), in which every citizen of a country has a health record linked to a national system, promote better health outcomes. They also promote higher levels of trust in the medical system. Medical data would also yield greater value for technology like Artificial Intelligence if the data was as comprehensive as it would be with a national EHR.
Since the passage of the Affordable Care Act in 2010 in the US, there has been rapid progress towards the introduction of Electronic Health Records (EHRs). Although few people in the US think that national EHRs are a near- or even medium-term prospect, it was hoped that EHRs would contribute to more coordinated care among the various health care providers in a limited geographical area. This would reduce duplication of medical tests, avoid errors due to one provider not having the information that another one has, and overall allow for the smooth and easy exchange of critical information.
Even though EHRs have proven to be an improvement in the US, doctors and hospitals still have difficulties in sharing data. Patients go to different providers, and it is still not easy to link records across different IT health systems. Converting one system’s data to be compatible with another’s is a hard problem.
Identity Resolution Can Improve Medical Data Sharing
Fortunately, there is a technology that will support the linking of data records and so facilitate the development of accurate and complete EHRs: Identity Resolution (also known as Entity Resolution).
Identity Resolution identifies the variations that occur in data elements across different records. Record fields typically consist of:
- First and last names (on occasion a middle name or initial)
- Date of Birth
- Home address
- Home and Mobile phone numbers
- Email address
- Insurance ID
- etc.
Linking Patient Records Can Be Tricky
Each field of a patient record offers its own challenges:
- Names vary: “Jim Baker” vs. “J. Baker” vs. “James R. Baker.” Phenomena like nicknames, initials, and simple typos may be common in the data.
- Dates need to be handled in accordance with their characteristics, e.g., the ordering of the pieces may vary:
- “October 9, 2017” vs. “9 October, 2017“
- “10/01/2017” vs. 01/10/2017 (U.S. vs. European)
- The nature of what’s considered a close match may vary, e.g., “August 4, 1938” is a pretty close match to “August 3, 1938,” but “January 3, 1961” is also a close match to “January 3, 1971.” The apparent 10 year gap in the latter could be caused by a simple fat-fingering typo. The matching has to take these kinds of phenomena into account.
- Addresses are quite complex, e.g.,
- 7735 8th Street, Columbia, NY 01923 vs. 7735 Eighth St. Columbia, New York 01923-3494.
There are four differences here that need to be handled (including the “short” form zip code versus the “long”).
In order to establish that two records refer to the same individual, it’s necessary to first match each of the above elements and provide a score for how close the two fields are.
In addition, it is necessary that there be a way to take similarity scores of each field, combine them according to business rules into a single score, and use that score to determine if two records belong to the same individual or not.
Here are some examples of patient records that show typical variations in the data:
Name | DoB | Address |
James Baker | 10/09/71 | 45 Maple St., Brentwood, VA |
Baker, Jim | Oct. 9, 1971 | 45 Maple Street, Brentwood, Virginia 22093 |
Name | DoB | Address |
Margaret L. Jones | 11/3/1990 | 6 Park Lane, Hialeya, ME |
Maggie Jones | November 3, 1990 | 6 Park Ln, Hialeya, Maine 01923 |
Name | DoB | Address |
Rashid Abdurrahman | 3 March, 1995 | 4 Emory Court, Louisville, MN |
Rachid ‘Abd al-Rahman | Mar 3, 1995 | Four Emory Ct., Louisville, Minnesota |
Name | DoB | Address |
Jose A. Benitez Artola | 3/4/1979 | 2134 Raspberry Dr., Olney, MD |
Pepe Benítez | 3 April 1979 | 2134 Rasberry Drive, Olney, Maryland |
How Identity Resolution Offers Highly Accurate Record Linking
In Identity Resolution, any pair of patient records are first compared with AI-based highly accurate Fuzzy Name Matching, which handles a wide spectrum of variations in person names, addresses, dates, phone numbers, etc. Then all records are clustered (or linked) according to their similarities calculated by Fuzzy Name Matching using a very efficient clustering algorithm that can handle a massive amount of records. Each resulting cluster represents a real individual in the world and is assigned a persistent ID. As new records become available and are added into the system, Identity Resolution determines whether they belong to the existing clusters or they are new patients, in which case new clusters with new IDs are assigned. The clustering algorithm assigns a score to each cluster, which indicates how closely the records in that cluster match each other and thus allows users to make tradeoffs between recall and precision based on their particular use cases.
In sum, it may be a while before the US gets to universal EHRs, but it can derive great improvements from the use of Identity Resolution to enhance the sharing of medical information.