skip to main content

MaSTA: a text-based machine learning approach for systems-of-systems in the big data context

Bianchi, Thiago

Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Ciências Matemáticas e de Computação 2019-04-11

Acesso online. A biblioteca também possui exemplares impressos.

  • Título:
    MaSTA: a text-based machine learning approach for systems-of-systems in the big data context
  • Autor: Bianchi, Thiago
  • Orientador: Nakagawa, Elisa Yumi
  • Assuntos: Aprendizado De Máquina; Big Data; Classificação De Texto; Sistema-De-Sistemas; Naive Bayes; System-Of-Systems; Machine Learning; Text Classification
  • Notas: Tese (Doutorado)
  • Descrição: Systems-of-systems (SoS) have gained a very important status in industry and academia as an answer to the growing complexity of software-intensive systems. SoS are particular in the sense that their capabilities transcend the mere sum of the capacities of their diverse independent constituents. In parallel, the current growth in the amount of data collected in different formats is impressive and imposes a considerable challenge for researchers and professionals, characterizing hence the Big Data context. In this scenario, Machine Learning techniques have been increasingly explored to analyze and extract relevant knowledge from such data. SoS have also generated a large amount of data and text information and, in many situations, users of SoS need to manually register unstructured, critical texts, e.g., work orders and service requests, and also need to map them to structured information. Besides that, these are repetitive, time-/effort-consuming, and even error-prone tasks. The main objective of this Thesis is to present MaSTA, an approach composed of an innovative classification method to infer classifiers from large textual collections and an evaluation method that measures the reliability and performance levels of such classifiers. To evaluate the effectiveness of MaSTA, we conducted an experiment with a commercial SoS used by large companies that provided us four datasets containing near one million records related with three classification tasks. As a result, this experiment indicated that MaSTA is capable of automatically classifying the documents and also improve the user assertiveness by reducing the list of possible classifications. Moreover, this experiment indicated that MaSTA is a scalable solution for the Big Data scenarios in which document collections have hundreds of thousands (even millions) of documents, even produced by different constituents of an SoS.
  • DOI: 10.11606/T.55.2019.tde-11092019-144236
  • Editor: Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Ciências Matemáticas e de Computação
  • Data de criação/publicação: 2019-04-11
  • Formato: Adobe PDF
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.