SC13 Denver, CO

The International Conference for High Performance Computing, Networking, Storage and Analysis

Scalable Parallel Algorithms for Clustering Large-Scale Biological Graphs.

Student: Inna Rytsareva (Washington State University)
Supervisor: Ananth Kalyanaraman (Washington State University)

Abstract: Clustering is one of the advanced analytical functions that has an immense potential to transform the knowledgespace when applied particularly to a data-rich domain such as computational biology. The computational hardness of the underlying theoretical problem necessitate use of heuristics in practice. In this study, we evaluate the application of a randomized sampling-based heuristic called shingling on unweighted biological graphs and propose a new variant of this heuristic that extends its application to weighted graph inputs and is better positioned to achieve qualitative gains on unweighted inputs as well. We also present parallel algorithms for this heuristic using the MapReduce paradigm. Experimental results on subsets of a medium-scale real world input (containing up to 10.3M vertices and 640M edges) demonstrate significant qualitative improvements in the reported clustering, both with and without using edge weights. Furthermore, performance studies indicate near-linear scaling on up to 4K cores of a distributed memory supercomputer.

Poster: pdf
Two-page extended abstract: pdf

Poster Index