sponsored byIEEEACMThe International Conference for High Performance 
Computing, Networking, Storage and Analysis
FacebookTwitterGoogle PlusLinkedInYouTubeFlickr

SCHEDULE: NOV 16-21, 2014

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

Toward Effective Detection of Silent Data Corruptions for HPC Applications

SESSION: Poster Reception

EVENT TYPE: Posters

TIME: 5:15PM - 7:00PM

AUTHOR(S):Sheng Di, Eduardo Berrocal, Leonardo Bautista-Gomez, Katherine Heisey, Rinku Gupta, Franck Cappello

ROOM:New Orleans Theater Lobby

ABSTRACT:

Because of the large number of components, future extreme-scale systems are expected to suffer a lot of silent data corruptions. Changes caused by silent errors flipping low-order bit positions are very small, making them difficult to detect by software. In this work, we convert the detection problem to a one-step look-ahead prediction issue and explore the most effective prediction methods for different HPC applications. We exploit the Auto Regressive (AR) model, Auto Regressive Moving Average (ARMA) Model, Linear Curve Fitting (LCF), and Quadratic Curve Fitting (QCF). We evaluate them using real HPC application traces. Experiments show that the error feed-back control plays an important role in improving detection. AR and QCF perform the best among all evaluated methods, where F-measure can be kept around 80% for silent bit-flip errors occurring around the bit position 20 for double-precision data or around bit 8 for single-precision data.

Chair/Author Details:

Sheng Di - French Institute for Research in Computer Science and Automation and Argonne National Laboratory

Eduardo Berrocal - Illinois Institute of Technology

Leonardo Bautista-Gomez - Argonne National Laboratory

Katherine Heisey - Argonne National Laboratory

Rinku Gupta - Argonne National Laboratory

Franck Cappello - Argonne National Laboratory

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar