SCHEDULE: NOV 16-22, 2013
When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.
Rethinking Algorithm-Based Fault Tolerance with a Cooperative Software-Hardware Approach
SESSION: Tools for Scalable Analysis
EVENT TYPE: Papers
TIME: 4:00PM - 4:30PM
SESSION CHAIR: Dorian C. Arnold
AUTHOR(S):Dong Li, Zizhong Chen, Panruo Wu, Jeffrey S. Vetter
ROOM:405/406/407
ABSTRACT:
Algorithm-based fault tolerance (ABFT) is a highly efficient resilience solution for many widely-used scientific computing kernels. However, in the context of the resilience ecosystem, ABFT is completely opaque to any underlying hardware resilience mechanisms. As a result, some data structures are over-protected by ABFT and hardware, which leads to unnecessary costs in terms of performance and energy. In this paper, we rethink ABFT using an integrated view including both software and hardware with the goal of improving performance and energy efficiency of ABFT-enabled applications. In particular, we study how to coordinate ABFT and error-correcting code (ECC) for main memory, and investigate the impact of this coordination on performance, energy, and resilience for ABFT-enabled applications. Scaling tests and analysis indicate that our approach saves up to 25% for system energy (and up to 40% for dynamic memory energy) with up to 18% performance improvement over traditional approaches of ABFT with ECC.
Chair/Author Details:
Dorian C. Arnold (Chair) - University of New Mexico
Dong Li - Oak Ridge National Laboratory
Zizhong Chen - University of California, Riverside
Panruo Wu - University of California, Riverside
Jeffrey S. Vetter - Oak Ridge National Laboratory
Click here to download .ics calendar file
Click here to download .vcs calendar file
Click here to add event to your Google Calendar
The full paper can be found in the ACM Digital Library
