skip to main content
Visitante
Meu Espaço
Minha Conta
Sair
Identificação
This feature requires javascript
Tags
Revistas Eletrônicas (eJournals)
Livros Eletrônicos (eBooks)
Bases de Dados
Bibliotecas USP
Ajuda
Ajuda
Idioma:
Inglês
Espanhol
Português
This feature required javascript
This feature requires javascript
Primo Search
Busca Geral
Busca Geral
Acervo Físico
Acervo Físico
Produção Intelectual da USP
Produção USP
Search For:
Clear Search Box
Search in:
Busca Geral
Or select another collection:
Search in:
Busca Geral
Busca Avançada
Busca por Índices
This feature requires javascript
This feature requires javascript
Heterogeneous Acceleration Pipeline for Recommendation System Training
Muhammad Adnan ; Yassaman Ebrahimzadeh Maboud ; Mahajan, Divya ; Nair, Prashant J
arXiv.org, 2024-04
Ithaca: Cornell University Library, arXiv.org
Texto completo disponível
Citações
Citado por
Exibir Online
Detalhes
Resenhas & Tags
Mais Opções
Nº de Citações
This feature requires javascript
Enviar para
Adicionar ao Meu Espaço
Remover do Meu Espaço
E-mail (máximo 30 registros por vez)
Imprimir
Link permanente
Referência
EasyBib
EndNote
RefWorks
del.icio.us
Exportar RIS
Exportar BibTeX
This feature requires javascript
Título:
Heterogeneous Acceleration Pipeline for Recommendation System Training
Autor:
Muhammad Adnan
;
Yassaman Ebrahimzadeh Maboud
;
Mahajan, Divya
;
Nair, Prashant J
Assuntos:
Central processing units
;
Computer Science - Artificial Intelligence
;
Computer Science - Hardware Architecture
;
Computer Science - Learning
;
CPUs
;
Deep learning
;
Embedding
;
Hybrid modes
;
Pipelines
;
Pipelining (computers)
;
Recommender systems
;
Stitching
;
Telephone hotlines
;
Training
É parte de:
arXiv.org, 2024-04
Descrição:
Recommendation models rely on deep learning networks and large embedding tables, resulting in computationally and memory-intensive processes. These models are typically trained using hybrid CPU-GPU or GPU-only configurations. The hybrid mode combines the GPU's neural network acceleration with the CPUs' memory storage and supply for embedding tables but may incur significant CPU-to-GPU transfer time. In contrast, the GPU-only mode utilizes High Bandwidth Memory (HBM) across multiple GPUs for storing embedding tables. However, this approach is expensive and presents scaling concerns. This paper introduces Hotline, a heterogeneous acceleration pipeline that addresses these concerns. Hotline develops a data-aware and model-aware scheduling pipeline by leveraging the insight that only a few embedding entries are frequently accessed (popular). This approach utilizes CPU main memory for non-popular embeddings and GPUs' HBM for popular embeddings. To achieve this, Hotline accelerator fragments a mini-batch into popular and non-popular micro-batches. It gathers the necessary working parameters for non-popular micro-batches from the CPU, while GPUs execute popular micro-batches. The hardware accelerator dynamically coordinates the execution of popular embeddings on GPUs and non-popular embeddings from the CPU's main memory. Real-world datasets and models confirm Hotline's effectiveness, reducing average end-to-end training time by 2.2x compared to Intel-optimized CPU-GPU DLRM baseline.
Editor:
Ithaca: Cornell University Library, arXiv.org
Idioma:
Inglês
Links
View paper in arXiv
This feature requires javascript
This feature requires javascript
Voltar para lista de resultados
This feature requires javascript
This feature requires javascript
Buscando em bases de dados remotas. Favor aguardar.
Buscando por
em
scope:(USP_PRODUCAO),scope:(USP_EBOOKS),scope:("PRIMO"),scope:(USP),scope:(USP_EREVISTAS),scope:(USP_FISICO),primo_central_multiple_fe
Mostrar o que foi encontrado até o momento
This feature requires javascript
This feature requires javascript