An Improved Parallel Singular Value Algorithm and Its Implementation for Multicore Hardware

SESSION: Matrix Computations


TIME: 11:30AM - 12:00PM

SESSION CHAIR: Laura Grigori

AUTHOR(S):Azzam Haidar, Jakub Kurzak, Piotr Luszczek


The enormous gap between the high-performance capabilities of today's CPUs and off-chip communication has made the development of numerical software that is scalable and performant extremely challenging. In this paper, we describe a successful methodology to address these challenges, starting from our algorithm design, kernel optimization and tuning, to our programming model in the development of a scalable high-performance singular-value-decomposition (SVD) solver. We developed a set of leading edge kernels combined with advanced optimization techniques featuring fine-grained, memory-aware kernels, a task-based approach and hybrid execution and scheduling that significantly increase the performance of the SVD solver. Our results demonstrate an enormous performance boost compared to current available software. In particular, our software is two-fold faster than the optimized Intel Math Kernel Library when all the singular vectors are required, achieves 4-times speedup when 20% of the vectors are computed and is significantly superior (12X) if only the singular-value is required.

Laura Grigori (Chair) - INRIA

Azzam Haidar - University of Tennessee, Knoxville

Jakub Kurzak - University of Tennessee, Knoxville

Piotr Luszczek - University of Tennessee, Knoxville

