Multi-Core Optimizations for Synergia and ART

SESSION: Research Poster Reception

EVENT TYPE: Posters, Electronic Posters, and Education Posters

TIME: 5:15PM - 7:00PM

AUTHOR(S):Qiming Lu, James Amundson, Nick Gnedin

ROOM:Mile High Pre-Function

We describe our recent work in optimizing the performance and scaling of Synergia and ART for multi-socket multi-core architectures including BlueGene/Q and GPUs. We show multiple hybridization and optimization options, including communication avoidance, interchangeable multi-threading kernel using OpenMP or CUDA for different hardware architectures, customized FFT, etc., each demonstrating much better scaling behavior than the pre-optimization code. By implementing the optimization techniques, we have extend strong scaling and peak performance by at least a factor of 2. We expect different optimization schemes to be optimal on different architectures. We have further tailored the code for BG/Q with optimized communication divider, redundant field solver, and FFT methods. The final code of Synergia scales up to 128K cores with over 90% efficiency running on Mira (BG/Q at Argonne)

Chair/Author Details:

Qiming Lu - Fermi National Accelerator Laboratory

James Amundson - Fermi National Accelerator Laboratory

Nick Gnedin - Fermi National Accelerator Laboratory

