Large-scale parallelization based on CPU and GPU cluster for cosmological fluid simulations. We present our parallel implementation for large-scale cosmological simulations of 3D supersonic fluids based on CPU and GPU clusters. Our developments are based on a CPU code named WIGEON. It is shown that, compared to the original sequential Fortran code, a speedup of 19--31 (depending on the specific GPU card) can be achieved on single GPU. Furthermore, our results show that the pure MPI parallelization scales very well up to 10 thousand CPU cores. In addition, a hybrid CPU/GPU parallelization scheme is introduced and a detailed analysis of the speedup and the scaling on the different number of CPU/GPU units are presented (up to 256 GPU cards due to computing resource limitation). Our high scalability and speedup rely on the domain decomposition approach, optimization of the algorithm and a series of techniques to optimize the CUDA implementation, especially in the memory access pattern on GPU. We believe this hybrid MPI+CUDA code can be an excellent candidate for 10 Peta-scale computing and beyond.
Keywords for this software
References in zbMATH (referenced in 4 articles )
Showing results 1 to 4 of 4.
- Eid, Al.: Stability of thin cylindrical shell in quadratic and cubic models of (\boldsymbolf(R)) gravity (2022)
- Shu, Chi-Wang: High order WENO and DG methods for time-dependent convection-dominated PDEs: A brief survey of several recent developments (2016)
- Kempe, Tobias; Aguilera, Alvaro; Nagel, Wolfgang; Fröhlich, Jochen: Performance of a projection method for incompressible flows on heterogeneous hardware (2015)
- Meng, Chen; Wang, Long; Cao, Zongyan; Feng, Long-long; Zhu, Weishan: Large-scale parallelization based on CPU and GPU cluster for cosmological fluid simulations (2015)