Idioma:

Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA

Ma, Yufei ; Cao, Yu ; Vrudhula, Sarma ; Seo, Jae-sun

IEEE transactions on very large scale integration (VLSI) systems, 2018-07, Vol.26 (7), p.1354-1367 [Periódico revisado por pares]

New York: IEEE

Texto completo disponível

Citações Citado por

Enviar para

Título:
Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA
Autor: Ma, Yufei ; Cao, Yu ; Vrudhula, Sarma ; Seo, Jae-sun
Assuntos: Acceleration ; Accelerator architectures ; Architecture ; Artificial neural networks ; Computer architecture ; Convolution ; convolutional neural networks (CNNs) ; Design optimization ; Field programmable gate arrays ; field-programmable gate array (FPGA) ; Hardware ; neural network hardware ; Neural networks ; Optimization ; Optimization techniques ; System-on-chip ; Tiling
É parte de: IEEE transactions on very large scale integration (VLSI) systems, 2018-07, Vol.26 (7), p.1354-1367
Descrição: As convolution contributes most operations in convolutional neural network (CNN), the convolution acceleration scheme significantly affects the efficiency and performance of a hardware CNN accelerator. Convolution involves multiply and accumulate operations with four levels of loops, which results in a large design space. Prior works either employ limited loop optimization techniques, e.g., loop unrolling, tiling, and interchange, or only tune some of the design variables after the accelerator architecture and dataflow are already fixed. Without fully studying the convolution loop optimization before the hardware design phase, the resulting accelerator can hardly exploit the data reuse and manage data movement efficiently. This paper overcomes these barriers by quantitatively analyzing and optimizing the design objectives (e.g., memory access) of the CNN accelerator based on multiple design variables. Then, we propose a specific dataflow of hardware CNN acceleration to minimize the data communication while maximizing the resource utilization to achieve high performance. The proposed CNN acceleration scheme and architecture are demonstrated by implementing end-to-end CNNs including NiN, VGG-16, and ResNet-50/ResNet-152 for inference. For VGG-16 CNN, the overall throughputs achieve 348 GOPS and 715 GOPS on Intel Stratix V and Arria 10 FPGAs, respectively.
Editor: New York: IEEE
Idioma: Inglês

Voltar para lista de resultados

Realização: Logos de Redes Sociais:

Optimizing the Convolution Operation to Accelerate Deep Neural Networks on FPGA

Ma, Yufei ; Cao, Yu ; Vrudhula, Sarma ; Seo, Jae-sun

New York: IEEE

Buscando em bases de dados remotas. Favor aguardar.