Entity Extraction Facilitates Compliance with DSAR and GDPR

Entity Extraction, Record Management, Risk Management

Entity Extraction facilitates compliance with DSAR and GDPR

GDPR Gives Individuals the Right to See the Information an Organization Has Collected on Them.

The European Union’s General Data Protection Regulation (GDPR) empowers individuals to receive copies of any personal information that an organization holds on them. The process of exercising this right is known as a Data Subject Access Request or DSAR for short.

Other than the collection effort, responding to a DSAR sounds straightforward, but it frequently isn’t.

Personal Data Contained in Unstructured Text Data is a Special Challenge

A special challenge for an organization responding to a DSAR is that a lot of sensitive personal data, known as Personally Identifiable Information (PII), may be found in unstructured data. Unstructured data refers to all the material in ordinary, narrative language that an organization generates in large quantities, such as email, internal memos and reports, call center notes, etc., which may be stored on individual machines, company servers, or in the cloud.

A further challenge is that sensitive personal data is frequently just part of a larger document and may well contain personal information about people other than the requestor. Regarding this kind of data, GDPR requires that it must be redacted prior to sending the document to the requestor.

For the most part, conventional search technology using keywords may be sufficient for finding the requestor’s own personal information, because after all the requestor’s name is already known. It’s a different matter in the case of the personal information of other individuals occurring in the same document as the requestor’s. Conventional search technology will not help here, since previously unknown personal names can’t be searched via keywords.

Manual redaction of other individuals’ PII would require lengthy and costly processing of documents, since humans would have to closely read the documents searching for such personal information. For more information on the difficulties of manual redaction, see here.

Entity Extraction Is a Critical Tool in Finding Sensitive Personal Data in Unstructured Data

Fortunately, there is an AI technology that can help: Entity Extraction.

Entity Extraction automatically identifies key data elements in unstructured text, including the names of people and other PII data elements associated with them, such as DoB, physical address, email, etc.

It does this not by having a long list of known names or other PII data elements. Instead, it employs AI techniques that make use of the textual context around PII as well as its internal composition to determine that a PII element is present.

The critical contribution of Entity Extraction is that it recognizes PII dynamically, which means that it identifies PII elements that have not been seen before. Entity Extraction automatically identifies and extracts PII in unstructured text at high speed and with great accuracy.

PII data extracted include:

    • Person name
    • Social Security number
    • DoB
    • Phone number
    • Email address
    • Physical address
    • Credit card numbers
    • Various other numeric expressions such as license plate numbers, bank account numbers, etc.

Once integrated with a redaction tool, the identified PII data can be reviewed by a human reviewer, saving the time and effort of reviewing full documents.

Summary

Responding to DSARs can be a costly and time-consuming activity for an organization, particularly in the redaction of all personal information not pertinent to the person who filed the DSAR.

Entity Extraction offers a fast and reliable means for locating PII in unstructured data that would otherwise be extremely time consuming, costly, and error prone to find and doing so in a timely and cost-effective manner.