- Edgar Meij, Yahoo Labs (@edgarmeij, http://edgar.meij.pro)
- Krisztian Balog, University of Stavanger (@krisztianbalog, http://krisztianbalog.com)
- Daan Odijk, University of Amsterdam (@dodijk, http://daan.odijk.me)
This tutorial presents a comprehensive introduction to entity linking and retrieval.
Part I provides a detailed overview of entity linking, which addresses identifying and disambiguating entity occurrences in unstructured text. We introduce the fundamental concepts and principles underlying entity linking, and detail state-of-the-art algorithms including unsupervised solutions, graph-based methods, and feature-based approaches in a machine learning setting. We continue with applications of entity linking for IR and conclude this part with a discussion of evaluation methodologies and initiatives in the context of entity linking.
Part II focuses on entity retrieval and begins with a study of scenarios where explicit representations of entities are available in the form of, e.g., Wikipedia pages or RDF triples. We then continue in a setting with more complex queries, requiring evidence to be collected and aggregated from massive volumes of unstructured textual data (with the potential help of some structured data). Such complex queries require a combination of techniques from both entity linking and entity retrieval. Throughout Part II, two main families of models are discussed: generative language models and discriminative feature-based models. Both the entity linking and entity retrieval parts are anchored in recent evaluation efforts conducted at standard benchmarking campaigns such as INEX, TAC, and TREC. We introduce test collections, tasks, evaluation methodology, and experimental results from these evaluation initiatives.
Both parts are concluded with an overview and hands-on comparative analysis of applications and publicly available toolkits and web services.
This work is licensed under a Creative Commons Attribution-ShareAlike 2.0 Generic License.