sponsored byIEEEACMThe International Conference for High Performance 
Computing, Networking, Storage and Analysis
FacebookTwitterGoogle PlusLinkedInYouTubeFlickr

SCHEDULE: NOV 16-21, 2014

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

Understanding Soft Error Resiliency of BlueGene/Q Compute Chip through Hardware Proton Irradiation and Software Fault Injection

SESSION: Hardware Vulnerability and Recovery


TIME: 1:30PM - 2:00PM

SESSION CHAIR: Alison Kennedy

AUTHOR(S):Chen-Yong Cher, Meeta S. Gupta, Pradip Bose, K. Paul Muller



Soft Error Resiliency is a major concern for Petascale high performance computing (HPC) systems. Blue Gene/Q (BG/Q) is the third generation of IBM’s massively parallel, energy efficient Blue Gene series of supercomputers. The principal goal of this work is to understand the interaction between BlueGene/Q’s hardware resiliency features and high-performance applications through proton irradiation of a real chip, and software resiliency inherent in these applications through application-level fault injection (AFI) experiments. From the proton irradiation experiments we derived that the mean time between correctable errors at sea level of the SRAM-based register files and Level-1 caches for a system similar to the scale of Sequoia system. From the AFI experiments, we characterized relative vulnerability among the applications in both general purpose and floating point register files. We categorized and quantified the failure outcomes, and discovered characteristics in the applications that may lead to many opportunities for improvement of resilience.

Chair/Author Details:

Alison Kennedy (Chair) - Edinburgh Parallel Computing Centre

Chen-Yong Cher - IBM Corporation

Meeta S. Gupta - IBM Corporation

Pradip Bose - IBM Corporation

K. Paul Muller - IBM Corporation

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar

Paper provided by the ACM Digital Library

Paper also available from IEEE Computer Society