BEGIN:VCALENDAR PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN VERSION:2.0 BEGIN:VEVENT DTSTART:20141116T193000Z DTEND:20141116T230000Z LOCATION:388 DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: The failure rates on high performance computing systems are increasing with increasing component count. Applications running on these systems currently experience failures on the order of days; however, on future systems, predictions of failure rates range from minutes to hours. Developers need to defend their application runs from losing valuable data by using fault tolerant techniques. These techniques range from changing algorithms, to checkpoint and restart, to programming model-based approaches. In this tutorial, we will present introductory material for developers who wish to learn fault tolerant techniques available on today’s systems. We will give background information on the kinds of faults occurring on today’s systems and trends we expect going forward. Following this, we will give detailed information on several fault tolerant approaches and how to incorporate them into applications. Our focus will be on scalable checkpoint and restart mechanisms and programming model-based approaches. SUMMARY:Practical Fault Tolerance on Today's Supercomputing Systems PRIORITY:3 END:VEVENT END:VCALENDAR