SC13 Denver, CO

The International Conference for High Performance Computing, Networking, Storage and Analysis

Optimizing Shared Resource Contention in HPC Clusters.

Authors: Sergey Blagodurov (Simon Fraser University), Alexandra Fedorova (Simon Fraser University)

Abstract: Contention for shared resources in High Performance Computing (HPC) clusters occurs when jobs are concurrently executing on the same multicore node (there is a contention for allocated CPU time, shared caches, memory bus, memory controllers, etc). This contention incurs severe degradation to workload performance and stability and hence must be addressed. The state-of-the-art HPC clusters, however, are not contention-aware, with no virtualization supported to mitigate the contention effects through live job migration. The goal of this work is the design, implementation and evaluation of a virtualized HPC scheduling framework that is contention aware.

Poster: pdf
Two-page extended abstract: pdf

Poster Index