The International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing Built-To-Order BLAS while Considering Matrix Order.
Authors: Robert Crimi (University Of Colorado Boulder), Elizabeth Jessup (University Of Colorado Boulder), Thomas Nelson (University Of Colorado Boulder), Jeremy Siek (Indiana University, University Of Colorado)
Abstract: Linear Algebra has always been a very useful tool in the field of Computer Science. Many techniques have been developed to help in the writing of graphics, internet search, and scientific computing algorithms. Such applications often use sequences of Basic Linear Algebra Subprograms (BLAS), and highly efﬁcient implementations of those routines enable scientists to achieve high performance at little cost. The BLAS evolved from basic problems, such as dot products, to a large database including more complex problems, such as a matrix-matrix multiplication. Computer scientists apply tuning techniques to improve data locality and create highly efﬁcient implementations of the Basic Linear Algebra Subprograms and LAPACK, enabling scientists to build high-performance software at reduced cost. However, because the BLAS are individually optimized, when piecing together different BLAS routines, users may not see the performance they desire. Thus, BTO is being developed to handle these situations