sponsored byIEEEACMThe International Conference for High Performance 
Computing, Networking, Storage and Analysis
SCHEDULE: NOV 16-21, 2014

Scalable Kernel Fusion for Memory-Bound GPU Applications

SESSION: Accelerators


TIME: 2:00PM - 2:30PM

SESSION CHAIR: Mitsuhisa Sato

AUTHOR(S):Mohamed Wahib, Naoya Maruyama



GPU implementations of HPC applications relying on finite difference methods can include tens of kernels that are memory-bound. Kernel fusion can improve the performance by reducing data traffic to off-chip memory; kernels that share data arrays are fused to larger kernels where on-chip cache is used to hold the data reused by instructions originating from different kernels. The main challenges are: a) Searching for the optimal kernel fusions while constrained by data dependences and kernels' precedences and, b) Effectively applying kernel fusion to achieve speedup. This paper introduces a problem definition and a scalable method for searching the space of possible kernel fusions to identify optimal kernel fusions for large problem sizes. The paper also introduces a codeless performance upper-bound projection to achieve effective fusions. Results show how using the proposed kernel fusion method improved the performance of two real-world applications containing tens of kernels by 1.35x and 1.2x.

Chair/Author Details:

Mitsuhisa Sato (Chair) - University of Tsukuba

Mohamed Wahib - RIKEN

Naoya Maruyama - RIKEN

Paper provided by the ACM Digital Library

Paper also available from IEEE Computer Society