skip to main content

Markov Blanket discovery without causal sufficiency: application in credit data

Jeronymo, Pedro Virgilio Basílio

Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Escola de Engenharia de São Carlos 2021-12-15

Acesso online

  • Título:
    Markov Blanket discovery without causal sufficiency: application in credit data
  • Autor: Jeronymo, Pedro Virgilio Basílio
  • Orientador: Maciel, Carlos Dias
  • Assuntos: Markov Blanket; Crédito; Descoberta Causal; Redes Bayesianas; Bayesian Networks; Causal Discovery; Credit; Markov Blanket
  • Notas: Dissertação (Mestrado)
  • Descrição: Faster feature selection algorithms become a necessity as Big Data dictates the zeitgeist. An important class of feature selectors are Markov Blanket (MB) learning algorithms. They are Causal Discovery algorithms that learn the local causal structure of a target variable. A common assumption in their theoretical basis, yet often violated in practice, is causal sufficiency. The M3B algorithm was proposed as the first to directly learn the MB without demanding causal sufficiency. The main drawback of M3B is that it is time inefficient, being intractable for high-dimensional inputs. Intending a faster method, we derive the Fast Markov Blanket Discovery Algorithm (FMMB). Empirical results that compare FMMB to M3B on the structural learning task show that FMMB outperforms M3B in terms of time efficiency, while preserving structural accuracy given a large enough sample size. Moreover, we introduce a new technique to aggregate bootstrapped MB structures, that first extracts a consensus MB, than constructs the aggregated structure as the union of the most probable path between each feature in the MB and the target. Comparisons with the state of the art shows that the proposed aggregation has a smaller loss of information. The analysis was conducted by using Credit-related data, with special focus on Peer-to-Peer lending platforms. Our results validate the credit scoring models used by these platforms as effective in identifying bad borrowers, yet still have room for improvement. Finally, we propose an ensemble of Bayesian Network Classifiers trained using the Cross-Entropy method. The ensemble performs better in credit scoring than Logistic Regression and Random Forests in the selected datasets.
  • DOI: 10.11606/D.18.2021.tde-19012022-113726
  • Editor: Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Escola de Engenharia de São Carlos
  • Data de criação/publicação: 2021-12-15
  • Formato: Adobe PDF
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.