skip to main content
Primo Search
Search in: Busca Geral

Enhancing Graph Random Walk Acceleration via Efficient Dataflow and Hybrid Memory Architecture

Gao, Yingxue ; Wang, Teng ; Gong, Lei ; Wang, Chao ; Hu, Yiqing ; Yang, Yi ; Liu, Zhongming ; Li, Xi ; Zhou, Xuehai

IEEE transactions on computers, 2024-03, Vol.73 (3), p.887-901 [Periódico revisado por pares]

IEEE

Texto completo disponível

Citações Citado por
  • Título:
    Enhancing Graph Random Walk Acceleration via Efficient Dataflow and Hybrid Memory Architecture
  • Autor: Gao, Yingxue ; Wang, Teng ; Gong, Lei ; Wang, Chao ; Hu, Yiqing ; Yang, Yi ; Liu, Zhongming ; Li, Xi ; Zhou, Xuehai
  • Assuntos: Computer architecture ; dataflow scheduling ; dedicated accelerator ; Engines ; Field programmable gate arrays ; Graph random walk ; hybrid memory architecture ; Memory architecture ; Parallel processing ; Pipelines ; Sampling methods
  • É parte de: IEEE transactions on computers, 2024-03, Vol.73 (3), p.887-901
  • Descrição: Graph random walk sampling is becoming increasingly important with the widespread popularity of graph applications. It aims to capture the desirable graph properties by launching multiple walkers to collect feature paths. However, previous research suffers long sampling latency and severe memory access bottlenecks due to intrinsic data dependency and skewed vertex distribution. Thus, in this paper, we propose FastRW, a dedicated accelerator to boost graph random walk operation on FPGAs. Specifically, FastRW first integrates multiple parallel processing engines to achieve data-level parallelism, where each processing engine also leverages dataflow scheduling to resolve data dependency and hide long sampling latency. Secondly, FastRW leverages a combination of multiple storage resources to implement a hybrid memory architecture adapted to skewed vertex distribution. By integrating the above optimizations, FastRW develops a performance model to take advantage of the balance between computation parallelism and bandwidth demand. We evaluate FastRW with two classic sampling algorithms on a wide range of real-world graph datasets. The experimental results show that FastRW achieves a speedup of 37.52\boldsymbol{\times} × on average over the system running on two 8-core Intel CPUs. FastRW also achieves an average of 28.04\boldsymbol{\times} × speedup over the architecture implemented on V100 GPU.
  • Editor: IEEE
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.