The International Conference for High Performance Computing, Networking, Storage and Analysis
GPU Acceleration of Small Dense Matrix Computation of the One-Sided Factorizations.
Authors: Tingxing Dong (University of Tennessee, Knoxville), Mark Gates (University of Tennessee, Knoxville), Azzam Haidar (University of Tennessee, Knoxville), Piotr Luszczek (University of Tennessee, Knoxville), Stanimire Tomov (University of Tennessee, Knoxville)
Abstract: In scientific of applications, one often needs to solve many small size problems. The size of each of these small linear systems depends, for example, on the number of the ordinary differential equations (ODEs) used in the model, and can be on the order of hundreds of unknowns. To efficiently exploit the computing power of modern accelerator hardware, these linear systems are processed in batches.The state-of-the-art libraries for linear algebra that target GPUs, such as MAGMA, focus on large matrix sizes. They change the data layout by transposing the matrix to avoid these divergence and non-coalescing penalties. However, the data movement associated with transposition is very expensive for small matrices. We propose a batched one-sided factorizations for GPUs by using a multi-level blocked right looking algorithm that preserves the data layout but minimizes the penalty of partial pivoting. Our implementation achieves many-fold speedup when compared to the alternatives.