The International Conference for High Performance Computing, Networking, Storage and Analysis
Towards Tera-Scale Performance for Longest Common Subsequence Using Graphics Processors.
Authors: Adnan Ozsoy (Indiana University Bloomington), Arun Chauhan (Indiana University Bloomington), Martin Swany (Indiana University Bloomington)
Abstract: GPUs have been attracting more and more high-performance users recently. However, the computation and memory access patterns in certain classes of algorithms do not lend themselves to hardware optimizations on GPUs, without which they fall far short of the promised performance. One such class of algorithms is longest common subsequence (LCS).
In this paper, we describe a novel technique to optimize LCS for one-to-many matching problem on GPUs by transforming the computation into bit-wise operations and a post-processing step. The former can be highly optimized and achieves more than a trillion operations (cell updates) per second (CUPS)---a first for LCS algorithms. The latter is more efficiently done on CPUs, in a fraction of the bit-wise computation time. The bit-wise step promises to be a foundational step and a fundamentally new approach to developing algorithms for increasingly popular heterogeneous environments that could dramatically increase the applicability of hybrid CPU-GPU environments.