skip to main content
Primo Search
Search in: Busca Geral

Cross Domain Visual Search with Feature Learning using Multi-stream Transformer-based Architectures

Ribeiro, Leo Sampaio Ferraz

Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Ciências Matemáticas e de Computação 2023-02-28

Acesso online

  • Título:
    Cross Domain Visual Search with Feature Learning using Multi-stream Transformer-based Architectures
  • Autor: Ribeiro, Leo Sampaio Ferraz
  • Orientador: Ponti, Moacir Antonelli
  • Assuntos: Aprendizado De Representações; Redes Neurais Em Grafos; Transformer; Busca De Imagens Baseada Em Desenhos; Aprendizado De Representações Entre Domínios; Cross-Domain Representation Learning; Feature Learning; Sketch-Based Image Retrieval; Graph Neural Network
  • Notas: Tese (Doutorado)
  • Descrição: Within the general field of Computer Vision, the task of Cross-domain Visual Search is one of the most useful and studied and yet it is rarely seen throughout our daily lives. In this thesis we explore Cross-domain Visual Search using the specific and mature Sketch-based Image Retrieval (SBIR) task as a canvas. We draw four distinct hypothesis as to how to further the field and demonstrate their validity with each contribution. First we present a new architecture for sketch representation learning that forgoes traditional Convolutional networks in favour of the recent Transformer design, called Sketchformer. Then we explore two alternative definitions for the SBIR task that each approach the scale and generalisation necessary for implementation in the real world. For both tasks we introduce state-of-the-art models: our Scene Designer combines traditional multi-stream networks with a Graph Neural Network to learn representations for sketched scenes with multiple object; our Sketch-an-Anchor shows that it is possible to harvest general knowledge from pre-trained models for the Zero-shot SBIR task. These contributions have a direct impact on the literature of sketch-based tasks and a cascaded impact on Image Undestanding and Cross-domain representations at large.
  • DOI: 10.11606/T.55.2023.tde-02062023-161527
  • Editor: Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Ciências Matemáticas e de Computação
  • Data de criação/publicação: 2023-02-28
  • Formato: Adobe PDF
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.