sponsored byIEEEACMThe International Conference for High Performance 
Computing, Networking, Storage and Analysis
FacebookTwitterGoogle PlusLinkedInYouTubeFlickr

SCHEDULE: NOV 16-21, 2014

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

Optimization of Multi-Level Checkpoint Model with Uncertain Execution Scales

SESSION: Optimized Checkpointing

EVENT TYPE: Papers

TIME: 2:30PM - 3:00PM

SESSION CHAIR: Patrick Bridges

AUTHOR(S):Sheng Di, Leonardo Bautista-Gomez, Franck Cappello

ROOM:393-94-95

ABSTRACT:

Future extreme-scale systems are expected to experience different types of failures affecting applications with different failure scales, from transient uncorrectable memory errors in processes to massive system outages. In this paper, we propose a multilevel checkpoint model by taking into account uncertain execution scales (different numbers of processes/cores). The contribution is threefold: (1) we provide an in-depth analysis on why it is difficult to derive the optimal checkpoint intervals for different checkpoint levels and optimize the number of cores simultaneously; (2) we devise a novel method that can quickly obtain an optimized solution - the first successful attempt in multilevel checkpoint models with uncertain scales; and (3) we perform both large-scale real experiments and extreme-scale numerical simulation to validate the effectiveness of our design. The experiments confirm that our optimized solution outperforms other state-of-the-art solutions by 4.3-88% on wall-clock length.

Chair/Author Details:

Patrick Bridges (Chair) - University of New Mexico

Sheng Di - French Institute for Research in Computer Science and Automation and Argonne National Laboratory

Leonardo Bautista-Gomez - Argonne National Laboratory

Franck Cappello - Argonne National Laboratory

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar


Paper provided by the ACM Digital Library

Paper also available from IEEE Computer Society