The International Conference for High Performance Computing, Networking, Storage and Analysis
Hybrid MPI/OpenMP/GPU Parallelization of XGC1 Fusion Simulation Code.
Authors: Eduardo F. D'Azevedo (Oak Ridge National Laboratory), Jianying Lang (Princeton Plasma Physics Laboratory), Patrick H. Worley (Oak Ridge National Laboratory), Stephane A. Ethier (Princeton Plasma Physics Laboratory), Seung-Hoe Ku (Princeton Plasma Physics Laboratory), Choong-Seock Chang (Princeton Plasma Physics Laboratory)
Abstract: By exploiting MPI, OpenMP, and CUDA Fortran, the FORTRAN fusion simulation code XGC1 achieves excellent weak scalability out to at least 18,624 GPU-CPU XK7 nodes, enabling science studies that have not been possible before.
XGC1 is a full-f gyrokinetic particle-in-cell code designed specifically for simulating edge plasmas in tokamaks. XGC1 was recently ported to and optimized on the 18,688 node Cray XK7 sited in the Oak Ridge Leadership Computing Facility, making use of both the 16-core AMD processor and the NVIDIA Kepler GPU on each node.
XGC1 uses MPI for internode and intranode parallelism, OpenMP for intranode parallelism, and CUDA Fortran for implementing key computational kernels on the GPU. XGC1 also uses the CPU and GPU simultaneously for these computational kernels. The optimized version achieves a four times speed-up over the original CPU-only version.