SCHEDULE: NOV 16-22, 2013
When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.
Parallel Reduction to Hessenberg Form with Algorithm-Based Fault Tolerance
SESSION: Matrix Computations
EVENT TYPE: Papers
TIME: 10:30AM - 11:00AM
SESSION CHAIR: Laura Grigori
AUTHOR(S):Yulu Jia, George Bosilca, Piotr Luszczek, Jack Dongarra
ROOM:401/402/403
ABSTRACT:
This paper studies the resilience of two-sided factorizations and presents a generic algorithm-based approach capable of rendering two-sided factorizations resilient. We establish the theoretical proof of the correctness and the numerical stability of the approach in the context of a Hessenberg Reduction (HR) and present the scalability and performance results of a practical implementation. Our method is a hybrid algorithm combining an Algorithm Based Fault Tolerance (ABFT) technique with diskless checkpointing to fully protect the data. We protect the trailing and the proceeding matrix with checksums, and protect finished panels in the panel scope with diskless checkpoints. Compared with the original HR (the ScaLAPACK PDGEHRD routine) our fault-tolerant algorithm introduces very little overhead, and maintains the same level of scalability. We prove that the overhead shows a decreasing trend as the size of the matrix or the size of the process grid increases.
Chair/Author Details:
Laura Grigori (Chair) - INRIA
Yulu Jia - University of Tennessee, Knoxville
George Bosilca - University of Tennessee, Knoxville
Piotr Luszczek - University of Tennessee, Knoxville
Jack Dongarra - University of Tennessee, Knoxville
Click here to download .ics calendar file
Click here to download .vcs calendar file
Click here to add event to your Google Calendar
The full paper can be found in the ACM Digital Library
