The International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable Performance Analysis of Exascale MPI Programs through Signature-Based Clustering Algorithms.
Authors: Amir Bahmani (North Carolina State University), Frank Mueller (North Carolina State University)
Abstract: Exascale computing pose a number of challenges to application
performance. Developers need to study application behavior by
collecting detailed information with the help of tracing toolsets. But
not only applications are scalability challenged, current tracing
toolsets also fall short of exascale requirements for low background
overheads since trace collection for each execution entity is becoming
infeasible. One effective solution is to cluster processes with the
same behavior into groups. Instead of collecting performance
information from all individuals, this information can be collected
from just a set of representatives. This work proposes a fast,
scalable, signature-based clustering algorithm that clusters processes
that exhibit the same execution behavior. Instead of prior work for
statistical clustering metrics, it produces precise results without
loss of events or accuracy. The proposed algorithm combines log(P)
time complexity, low overhead at the clustering level, and it splits
the merge process to make tracing suitable for exascale computing.