sponsored byIEEEACMThe International Conference for High Performance 
Computing, Networking, Storage and Analysis
FacebookTwitterGoogle PlusLinkedInYouTubeFlickr

SCHEDULE: NOV 16-21, 2014

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing

SESSION: Machine Learning and Data Analytics

EVENT TYPE: Papers

TIME: 10:30AM - 11:00AM

SESSION CHAIR: Hank Childs

AUTHOR(S):Zhengzhang Chen, Seung Woo Son, William Hendrix, Ankit Agrawal, Wei-keng Liao, Alok Choudhary

ROOM:388-89-90

ABSTRACT:

Data checkpointing is an important fault tolerance technique in High Performance Computing systems. This paper exploits the fact that in many scientific applications, relative change in data values from one simulation iteration to the next are not very significantly different from each other. Thus, capturing the distribution of relative changes in data instead of storing data itself allows us to incorporate the temporal dimension of the data, and learn evolving distribution of the changes. We show that an order of magnitude data reduction becomes achievable with a user-defined and guaranteed error bounds for each data point. We propose NUMARCK, NU Machine learning Algorithm for Resiliency and ChecKpointing, that makes use of the emerging distributions of data changes between consecutive simulation iterations, and encodes them into an indexing space that can be concisely represented. We evaluate NUMARCK using two production scientific simulations, FLASH and CMIP5, and demonstrate a superior performance.

Chair/Author Details:

Hank Childs (Chair) - University of Oregon and Lawrence Berkeley National Laboratory

Zhengzhang Chen - Northwestern University

Seung Woo Son - Northwestern University

William Hendrix - Northwestern University

Ankit Agrawal - Northwestern University

Wei-keng Liao - Northwestern University

Alok Choudhary - Northwestern University

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar


Paper provided by the ACM Digital Library

Paper also available from IEEE Computer Society