Contact us and see what NetOwl can do for you!
What is Faceted Search?
What Is Faceted Search?
Faceted Search is a concept that’s been around for a while. It has become a standard part of life online. For example, a simple type of Faceted Search is found on every e-commerce web site. Customers can specify different search filters for products along multiple dimensions that amount to facets, e.g.,
- Manufacturer
- Price
- Size
- Color
- etc.
This kind of faceted search is very intuitive and has become the standard for e-commerce.
By comparison, it has been difficult to extend Faceted Search to unstructured, natural language text. Conventional search engines have been limited to just keyword search of the text in addition to the available metadata such as publication date, source, and any topic categories that have been assigned. For news articles, these categories are more or less standardized, such as U.S., World, Business, Sports, etc.
Keyword search is fine as far as it goes, but it is limited. For example, it may provide wrong hits when there is a semantic ambiguity in the search term (“bank” means both the bank of a river as well as a financial institution). Moreover, it lacks a discovery aspect: if you don’t know appropriate keywords for what you are looking for, you can’t find it.
What are the Challenges of Enabling Faceted Search for Text?
E-commerce sites exploit their databases, which are already structured. Here the number and nature of the facets (e.g., size, color, brand) are already defined in a company’s product database. The facet values are usually limited and standardized, e.g., small/medium/large. The choices consumers can make are strictly limited.
By contrast, unstructured, natural language text does not come with such databases. The metadata available with a text such as a news article is limited, because users are mostly interested in searching the content of text, not metadata like dates and sources. There needs to be a way to automatically derive facets and their values from text content.
Luckily, there’s a technology that accomplishes this: Entity Extraction.
Entity Extraction Transforms Unstructured Text into Structured Data
Entity Extraction derives useful facets and their values from unstructured text automatically. Entity Extraction typically uses a semantic ontology that contains a specification of what is being extracted, i.e., facets. Examples of these are Person, Company, Country, City, etc. What is extracted under each ontology type (e.g., Company) are the values for that facet (e.g., Apple, Amazon, Tesla).
More advanced Entity Extraction products can normalize the values of extracted entities, which makes searching them easier:
- United States of America, U.S.A., United States, U.S. → USA
- Amazon.com, Inc., Amazon.com, Amazon → Amazon.com
Once entities are extracted, they can be stored in a database and made searchable in a Faceted Search manner, much like what you find on e-commerce web sites
What Are the Benefits of Faceted Search of Unstructured Text?
Entity Extraction offers capabilities far beyond those of conventional keyword search:
- Entity Extraction semantically distinguishes names from non-names and enables more accurate search results than conventional key-word search: it can tell the difference between “Apple” used as a company name and “apple” as a common noun.
- It also distinguishes among different types of names: “Amazon” as a company name vs. the name of a river.
- It discovers the most popular (frequent) names under each ontology type given a corpus of texts. This is like e-commerce faceted search where, if a facet such as Kitchen: Coffee Maker is selected, a list of most popular coffee makers are presented. This type of ranking can be made particularly accurate with the normalization of extracted names mentioned above.
- Using publication dates in the metadata, the frequency of names can be tracked over time to see trends in data as well.
- When selecting one facet value (e.g., Company: Apple), co-occurring names in other facets can show possible linkages (e.g., Person: Tim Cook, City: Cupertino) among these entities.
In this way Faceted Search supported by Entity Extraction enables a sophisticated, semantically based search of unstructured text data as well as knowledge discovery beyond keyword search.
Related Blogs
- For more information on Entity Extraction, see our related blog here.