skip to main content
Primo Advanced Search
Primo Advanced Search Query Term
Primo Advanced Search prefilters

TRIVIR: A Visualization System to Support Document Retrieval with High Recall

Dias, Amanda Gonçalves

Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Ciências Matemáticas e de Computação 2019-07-08

Acesso online. A biblioteca também possui exemplares impressos.

  • Título:
    TRIVIR: A Visualization System to Support Document Retrieval with High Recall
  • Autor: Dias, Amanda Gonçalves
  • Orientador: Oliveira, Maria Cristina Ferreira de
  • Assuntos: Aprendizado De Máquina; Visualização; Recuperação De Informação; Incompatibilidade De Vocabulário; Cobertura Total; Information Retrieval; Machine Learning; Total Recall; Visualization; Vocabulary Mismatch
  • Notas: Dissertação (Mestrado)
  • Descrição: A high recall problem in document retrieval is described by scenarios in which one wants to ensure that, given one (or multiple) query document(s), (nearly) all relevant related documents are retrieved, with minimum human effort. The problem may be expressed as a document similarity search: a user picks an example document (or multiple ones), and an automatic system recovers similar ones from a collection. This problem is often handled with a so-called Continuous Active Learning strategy: given the initial query, which is a document described by a set of relevant terms, a learning method returns the most-likely relevant documents (e.g., the most similar) to the reviewer in batches, the reviewer labels each document as relevant/not relevant and this information is fed back into the learning algorithm, which uses it to refine its predictions. This iterative process goes on until some quality condition is satisfied, which might demand high human effort, since documents are displayed as ranked lists and need to be labeled individually, and impact negatively the convergence of the learning algorithm. Besides, the vocabulary mismatch issue, i.e., when distinct terminologies are employed to describe semantically related or equivalent concepts, can impair recall capability. We propose TRIVIR, a novel interactive visualization tool powered by an information retrieval (IR) engine that implements an active learning protocol to support IR with high recall. The system integrates multiple graphical views in order to assist the user identifying the relevant documents in a collection. Given representative documents as queries, users can interact with the views to label documents as relevant/not relevant, and this information is used to train a machine learning (ML) algorithm which suggests other potentially relevant documents. TRIVIR offers two major advantages over existing visualization systems for IR. First, it merges the ML algorithm output into the visualization, while supporting several user interactions in order to enhance and speed up its convergence. Second, it tackles the vocabulary mismatch problem, by providing terms synonyms and a view that conveys how the terms are used within the collection. Besides, TRIVIR has been developed as a flexible front-end interface that can be associated with distinct text representations and multidimensional projection techniques. We describe two use cases conducted with collaborators who are potential users of TRIVIR. Results show that the system simplified the search for relevant documents in large collections, based on the context in which the terms occur.
  • DOI: 10.11606/D.55.2019.tde-11092019-090930
  • Editor: Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Ciências Matemáticas e de Computação
  • Data de criação/publicação: 2019-07-08
  • Formato: Adobe PDF
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.