skip to main content

A GPU-enabled acceleration algorithm for the CAM5 cloud microphysics scheme

Hong, Yan ; Wang, Yuzhu ; Zhang, Xuanying ; Wang, Xiaocong ; Zhang, He ; Jiang, Jinrong

The Journal of supercomputing, 2023-11, Vol.79 (16), p.17784-17809 [Periódico revisado por pares]

New York: Springer US

Texto completo disponível

Citações Citado por
  • Título:
    A GPU-enabled acceleration algorithm for the CAM5 cloud microphysics scheme
  • Autor: Hong, Yan ; Wang, Yuzhu ; Zhang, Xuanying ; Wang, Xiaocong ; Zhang, He ; Jiang, Jinrong
  • Assuntos: Algorithms ; Central processing units ; Cloud computing ; Compilers ; Computer Science ; CPUs ; Data transfer (computers) ; Interpreters ; Microphysics ; Optimization ; Parallel processing ; Processor Architectures ; Programming Languages
  • É parte de: The Journal of supercomputing, 2023-11, Vol.79 (16), p.17784-17809
  • Descrição: The National Center for Atmospheric Research released a global atmosphere model named Community Atmosphere Model version 5.0 (CAM5), which aimed to provide a global climate simulation for meteorological research. Among them, the cloud microphysics scheme is extremely time-consuming, so developing efficient parallel algorithms faces large-scale and chronic simulation challenges. Due to the wide application of GPU in the fields of science and engineering and the NVIDIA’s mature and stable CUDA platform, we ported the code to GPU to accelerate computing. In this paper, by analyzing the parallelism of CAM5 cloud microphysical schemes (CAM5 CMS) in different dimensions, corresponding GPU-based one-dimensional (1D) and two-dimensional (2D) parallel acceleration algorithms are proposed. Among them, the 2D parallel algorithm exploits finer-grained parallelism. In addition, we present a data transfer optimization method between the CPU and GPU to further improve the overall performance. Finally, GPU version of the CAM5 CMS (GPU-CMS) was implemented. The GPU-CMS can obtain a speedup of 141.69 × on a single NVIDIA A100 GPU with I/O transfer. In the case without I/O transfer, compared to the baseline performance on a single Intel Xeon E5-2680 CPU core, the 2D acceleration algorithm obtained a speedup of 48.75 × , 280.11 × , and 507.18 × on a single NVIDIA K20, P100, and A100 GPU, respectively.
  • Editor: New York: Springer US
  • Idioma: Inglês

Buscando em bases de dados remotas. Favor aguardar.