Abstract: In this work we consider a method for parallelizing matrix factorization algorithms on the on systems with Intel(R) Xeon Phi(TM) coprocessors. Performance results of matrix factorization routines implementing this approach and available from Intel(R) Math Kernel Library (Intel MKL) on the Intel(R) Xeon(R) platform with Xeon Phi(TM) coprocessors are provided.
The implementation of our method is DAG-based and uses panel factorization kernels that were redesigned and rewritten for new Intel(R) Xeon Phi(TM) architectures.
The proposed novel method provides a high degree of parallelism while minimizing synchronizations and communications. The algorithm enables adaptable workload distribution between CPUs and coprocessors to improve load balancing. The main features of the algorithm are:
Adaptive data/task distribution on the fly between CPUs and coprocessors to improve load balancing; Efficient utilization of all available computational units in heterogeneous systems;Support of heterogeneous systems with unlimited number of coprocessors;Scalability.