Contact us and see what NetOwl can do for you!
Identity Resolution for the Entertainment Industry
Entertainment Has Gone Global
In film and other entertainment industries, there’s great interest in what’s called “talent analytics,” aka “people analytics.” With the growth of movie and TV markets internationally, particularly in Asia and Latin America, there’s a need to develop a comprehensive picture of all the people involved in making movies and original TV series – whether producers, actors, directors, writers, makeup artists, set designers, etc. When a show runner or producer is planning a new film or original series, they will want access to the most complete and accurate information on what resources exist around the world.
There Are Many Obstacles to Finding Resources for the Entertainment Industry
A major challenge to gathering such information is that there are currently many sources containing information on the entertainment industry. For example, here’s a specialized site for French actors (in English), and here’s one for German actors (in German). An entertainment company will also receive data from specialized data providers, e.g., on award-winning films from the past hundred years or box office information.
These data sources will frequently contain different name variants on the same individual:
- Actors’ names may be carried over into English with different spellings. For example, the name of the well-known Egyptian actor Omar Sharif had the two spelling variants Omar el-Sharief and Omar Cherif. These variants happen because the original Arabic script version of the name, عمر الشريف, can be converted into English in multiple ways.
- Order of the name elements may vary: Zheng Kai vs. Kai Zheng. In Asian names, traditionally the family name, here “Zheng,” goes first, but in more Westernized contexts, the name may be given in Western order.
- The name may be tokenized differently: the name of a well-known Chinese actor may be rendered in Latin script as Chow Yung Fat, Chow Yung-Fat, or Chow Yungfat.
- The whole name may be represented in one database field, or there may be separate fields for the different name parts. In the latter case, as an added level of difficulty, “Chow Yung Fat” could be split up into one field (“Chow Yung Fat”), two fields (“Chow” and “Yung Fat”), or three (“Chow,” “Yung,” and “Fat”).
- In addition there is the problem of data in multiple scripts. Actors’ names may be in a foreign script in some specialized databases. For example, Chow Yung Fat is 周潤發 in Chinese.
Fortunately, there’s a new technology that can be used to merge all the data that’s out there: Identity Resolution, aka Entity Resolution.
How Identity Resolution Works
Merging records from different databases is a complex problem. First, various fields have to be matched, including not just names but also other key attributes (date of birth, place of birth, nationality, height, etc.). And just like person names, there will be data variations in the various fields such as:
- For place names, there are frequently different forms:
- Maryland vs. MD
- Mazar-i-Sharif vs. Mazar-e Sharif vs. Mazar
- For dates of birth, there is the American convention Month/Day/Year vs. the European Day/Month/Year)
- Nationality could be indicated in different ways, e.g, “UK” or “British”
- Height could be specified in feet/inches or meters/centimeters.
- And many others
(Of course, simple misspellings in all fields also occur frequently.)
In addition, a sophisticated Identity Resolution system must provide a confidence measure of the likelihood of a match. To do this, it must score the match for each individual field element, and then intelligently combine those individual field scores into a total combined score for the entire database record. For example, accurately matching information about a specific individual may rely especially on knowledge about their name and place of birth, among others.
What a Good Identity Resolution Product Must Provide
Here are some key features of a good Identity Resolution product for the entertainment industry:
- Accurate. Matches names that exhibit a variety of challenging phenomena with high accuracy.
- Cross-lingual. Supports name matching not only within languages but across different languages.
- Scalable. Handles databases with large numbers of records.
- Customizable. Accommodates application-specific business rules that determine what combination of record attributes should be matched and how important each attribute is to the overall matching score.
In sum, Identity Resolution is an ideal technology for helping the entertainment industry find the best available resources.