skip to main content

Fast and Scalable Outlier Detection with Metric Access Methods

Bispo Junior, Altamir Gomes

Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Ciências Matemáticas e de Computação 2019-07-25

Acesso online. A biblioteca também possui exemplares impressos.

  • Título:
    Fast and Scalable Outlier Detection with Metric Access Methods
  • Autor: Bispo Junior, Altamir Gomes
  • Orientador: Cordeiro, Robson Leonardo Ferreira
  • Assuntos: Detecção De Outliers Não-Supervisionada; Ciência Computacional Aplicada; Mineração De Dados; Dados Complexos; Métodos De Acesso Métrico; Metric Access Methods; Applied Computational Sciences; Data Mining; Complex Data; Unsupervised Outlier Detection
  • Notas: Dissertação (Mestrado)
  • Descrição: It is well-known that the existing theoretical models for outlier detection make assumptions that may not reflect the true nature of outliers in every real application. This dissertation describes an empirical study performed on unsupervised outlier detection using 8 algorithms from the state-of-the-art and 8 datasets that refer to a variety of real-world tasks of practical relevance, such as spotting cyberattacks, clinical pathologies and abnormalities occurring in nature. We present our lowdown on the results obtained, pointing out to the strengths and weaknesses of each technique from the application specialists point of view, which is a shift from the designer-based point of view that is commonly adopted. Many of the techniques had unfeasibly high runtime requirements or failed to spot what the specialists consider as outliers in their own data. To tackle this issue, we propose MetricABOD: a novel ABOD-based algorithm that makes the analysis up to thousands of times faster, still being in average 26% more accurate than the most accurate related work. This improvement is tantamount to practical outlier detection in many real-world applications for which the existing methods present unstable accuracy or unfeasible runtime requirements. Finally, we studied two collections of text data to show that our MetricABOD works also for adimensional, purely metric data.
  • DOI: 10.11606/D.55.2019.tde-04102019-154943
  • Editor: Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Ciências Matemáticas e de Computação
  • Data de criação/publicação: 2019-07-25
  • Formato: Adobe PDF
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.