SC13 Home > SC13 Schedule > SC13 Presentation - Parallel Reduction to Hessenberg Form with Algorithm-Based Fault Tolerance

SCHEDULE: NOV 16-22, 2013

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

Parallel Reduction to Hessenberg Form with Algorithm-Based Fault Tolerance

SESSION: Matrix Computations

EVENT TYPE: Papers

TIME: 10:30AM - 11:00AM

SESSION CHAIR: Laura Grigori

AUTHOR(S):Yulu Jia, George Bosilca, Piotr Luszczek, Jack Dongarra

ROOM:401/402/403

ABSTRACT:
This paper studies the resilience of two-sided factorizations and presents a generic algorithm-based approach capable of rendering two-sided factorizations resilient. We establish the theoretical proof of the correctness and the numerical stability of the approach in the context of a Hessenberg Reduction (HR) and present the scalability and performance results of a practical implementation. Our method is a hybrid algorithm combining an Algorithm Based Fault Tolerance (ABFT) technique with diskless checkpointing to fully protect the data. We protect the trailing and the proceeding matrix with checksums, and protect finished panels in the panel scope with diskless checkpoints. Compared with the original HR (the ScaLAPACK PDGEHRD routine) our fault-tolerant algorithm introduces very little overhead, and maintains the same level of scalability. We prove that the overhead shows a decreasing trend as the size of the matrix or the size of the process grid increases.

Chair/Author Details:

Laura Grigori (Chair) - INRIA

Yulu Jia - University of Tennessee, Knoxville

George Bosilca - University of Tennessee, Knoxville

Piotr Luszczek - University of Tennessee, Knoxville

Jack Dongarra - University of Tennessee, Knoxville

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

The full paper can be found in the ACM Digital Library