The International Conference for High Performance Computing, Networking, Storage and Analysis

Performance Portable Parallel Programming - Compile-Time Defined Parallelization and Storage Order for Accelerators and CPUs.

Authors: Michel Müller (Tokyo Institute of Technology)

Abstract: Performance portability between CPU and accelerators is a major challenge for coarse grain parallelized codes. Hybrid Fortran offers a new approach in porting for accelerators that requires minimal code changes and allows to keep the performance of CPU optimized loop structures and storage orders. This is achieved through a compile-time code transformation where the CPU and accelerator cases are treated separately. Results show minimal performance losses compared to the fastest non-portable solution on both CPU and GPU. Using this approach, five applications have been ported to accelerators, showing minimal or no slowdown on CPU while enabling high speedups on GPU.

Poster: pdf
Two-page extended abstract: pdf

Poster Index