SC13 Denver, CO

The International Conference for High Performance Computing, Networking, Storage and Analysis

Scalable Performance Analysis of Exascale MPI Programs through Signature-Based Clustering Algorithms.

Authors: Amir Bahmani (North Carolina State University), Frank Mueller (North Carolina State University)

Abstract: Exascale computing pose a number of challenges to application performance. Developers need to study application behavior by collecting detailed information with the help of tracing toolsets. But not only applications are scalability challenged, current tracing toolsets also fall short of exascale requirements for low background overheads since trace collection for each execution entity is becoming infeasible. One effective solution is to cluster processes with the same behavior into groups. Instead of collecting performance information from all individuals, this information can be collected from just a set of representatives. This work proposes a fast, scalable, signature-based clustering algorithm that clusters processes that exhibit the same execution behavior. Instead of prior work for statistical clustering metrics, it produces precise results without loss of events or accuracy. The proposed algorithm combines log(P) time complexity, low overhead at the clustering level, and it splits the merge process to make tracing suitable for exascale computing.

Poster: pdf
Two-page extended abstract: pdf

Poster Index