Idioma:

Cross Domain Visual Search with Feature Learning using Multi-stream Transformer-based Architectures

Ribeiro, Leo Sampaio Ferraz

Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Ciências Matemáticas e de Computação 2023-02-28

Acesso online

Enviar para

Título:
Cross Domain Visual Search with Feature Learning using Multi-stream Transformer-based Architectures
Autor: Ribeiro, Leo Sampaio Ferraz
Orientador: Ponti, Moacir Antonelli
Assuntos: Aprendizado De Representações; Redes Neurais Em Grafos; Transformer; Busca De Imagens Baseada Em Desenhos; Aprendizado De Representações Entre Domínios; Cross-Domain Representation Learning; Feature Learning; Sketch-Based Image Retrieval; Graph Neural Network
Notas: Tese (Doutorado)
Descrição: Within the general field of Computer Vision, the task of Cross-domain Visual Search is one of the most useful and studied and yet it is rarely seen throughout our daily lives. In this thesis we explore Cross-domain Visual Search using the specific and mature Sketch-based Image Retrieval (SBIR) task as a canvas. We draw four distinct hypothesis as to how to further the field and demonstrate their validity with each contribution. First we present a new architecture for sketch representation learning that forgoes traditional Convolutional networks in favour of the recent Transformer design, called Sketchformer. Then we explore two alternative definitions for the SBIR task that each approach the scale and generalisation necessary for implementation in the real world. For both tasks we introduce state-of-the-art models: our Scene Designer combines traditional multi-stream networks with a Graph Neural Network to learn representations for sketched scenes with multiple object; our Sketch-an-Anchor shows that it is possible to harvest general knowledge from pre-trained models for the Zero-shot SBIR task. These contributions have a direct impact on the literature of sketch-based tasks and a cascaded impact on Image Undestanding and Cross-domain representations at large.
DOI: 10.11606/T.55.2023.tde-02062023-161527
Editor: Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Ciências Matemáticas e de Computação
Data de criação/publicação: 2023-02-28
Formato: Adobe PDF
Idioma: Inglês

Links

Voltar para lista de resultados

Realização: Logos de Redes Sociais:

Cross Domain Visual Search with Feature Learning using Multi-stream Transformer-based Architectures

Ribeiro, Leo Sampaio Ferraz

Biblioteca Digital de Teses e Dissertações da USP; Universidade de São Paulo; Instituto de Ciências Matemáticas e de Computação 2023-02-28

Buscando em bases de dados remotas. Favor aguardar.