SC13 Home > SC13 Schedule > SC13 Presentation - Optimal Placement of Retry-Based Fault Recovery Annotations in HPC Applications

SCHEDULE: NOV 16-22, 2013

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

Optimal Placement of Retry-Based Fault Recovery Annotations in HPC Applications

SESSION: Research Poster Reception

EVENT TYPE: Posters, Electronic Posters, and Education Posters

TIME: 5:15PM - 7:00PM

AUTHOR(S):Ignacio Laguna, Martin Schulz, Jeff Keasler, David Richards, Jim Belak

ROOM:Mile High Pre-Function

ABSTRACT:
As larger HPC systems are built, fault recovery becomes a fundamental capability. Traditional fault recovery approaches, such as checkpointing, may not be sufficient for future exascale systems. Retry-based recovery techniques have been proposed as an alternative. These techniques simply re-execute a code region when a fault occurs and require code annotations. However, no previous work has investigated the optimal placement of these annotations in a program. Via fault injection, we evaluate how to place optimally retry annotations in a hydrodynamics mini application. We found that, contrary to our expectations, a simple scheme of protecting the main function works well for low fault rates: slowdown is up to 1.25 for a 3 faults/hour rate. We also found that the optimal recovery method is rolling a few iterations back in the application's main loop.

Chair/Author Details:

Ignacio Laguna - Lawrence Livermore National Laboratory

Martin Schulz - Lawrence Livermore National Laboratory

Jeff Keasler - Lawrence Livermore National Laboratory

David Richards - Lawrence Livermore National Laboratory

Jim Belak - Lawrence Livermore National Laboratory

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar