SC13 Denver, CO

The International Conference for High Performance Computing, Networking, Storage and Analysis

Multi-Core Optimizations for Synergia and ART.

Authors: Qiming Lu (Fermi National Accelerator Laboratory), James Amundson (Fermi National Accelerator Laboratory), Nick Gnedin (Fermi National Accelerator Laboratory)

Abstract: We describe our recent work in optimizing the performance and scaling of Synergia and ART for multi-socket multi-core architectures including BlueGene/Q and GPUs. We show multiple hybridization and optimization options, including communication avoidance, interchangeable multi-threading kernel using OpenMP or CUDA for different hardware architectures, customized FFT, etc., each demonstrating much better scaling behavior than the pre-optimization code. By implementing the optimization techniques, we have extend strong scaling and peak performance by at least a factor of 2. We expect different optimization schemes to be optimal on different architectures. We have further tailored the code for BG/Q with optimized communication divider, redundant field solver, and FFT methods. The final code of Synergia scales up to 128K cores with over 90% efficiency running on Mira (BG/Q at Argonne)

Poster: pdf
Two-page extended abstract: pdf

Poster Index