SC13 Denver, CO

The International Conference for High Performance Computing, Networking, Storage and Analysis

Enabling Low I/O-Cost Fine-Grained Checkpointing in QMCPack.

Student: Michael Matheny (University of Delaware)
Supervisor: Scott Klasky (Oak Ridge National Laboratory)

Abstract: As simulation size increases, the likelihood of a node crash and the cost of checkpointing increases. This is more of a problem as flops increase exponentially while I/O speeds increase at a slower rate. The cost of saving data is even higher when checkpointed data is used for in-situ or in-transit analyses such as for visualization or for atom proximity identifications. QMCPack is one such example that is limited by its checkpointing implementation. QMCPack uses serial HDF5 to write checkpointing data to disk. As the number of processes increase, the associated I/O cost grows significantly. Preliminary results showed that even for simple simulations, the time spent in I/O associated to a checkpointing could make up 90% of the execution time on systems with large numbers of nodes. In this poster we use ADIOS to tackle the problem of enabling more efficient fine-grained checkpointing in QMCPack while limiting the I/O overhead.

Poster: pdf
Two-page extended abstract: pdf

Poster Index