BEGIN:VCALENDAR
PRODID:-//Microsoft Corporation//Outlook MIMEDIR//EN
VERSION:2.0
BEGIN:VEVENT
DTSTART:20141118T231500Z
DTEND:20141119T010000Z
LOCATION:New Orleans Theater Lobby
DESCRIPTION;ENCODING=QUOTED-PRINTABLE:ABSTRACT: In scientific of applications, one often needs to solve many small size problems. The size of each of these small linear systems depends, for example, on the number of the ordinary differential equations (ODEs) used in the model, and can be on the order of hundreds of unknowns. To efficiently exploit the computing power of modern accelerator hardware, these linear systems are processed in batches.The state-of-the-art libraries for linear algebra that target GPUs, such as MAGMA, focus on large matrix sizes. They change the data layout by transposing the matrix to avoid these divergence and non-coalescing penalties. However, the data movement associated with transposition is very expensive for small matrices. We propose a batched one-sided factorizations for GPUs by using a multi-level blocked right looking algorithm that preserves the data layout but minimizes the penalty of partial pivoting. Our implementation achieves many-fold speedup when compared to the alternatives.
SUMMARY:GPU Acceleration of Small Dense Matrix Computation of the One-Sided Factorizations
PRIORITY:3
END:VEVENT
END:VCALENDAR