The International Conference for High Performance Computing, Networking, Storage and Analysis
Performance Portable Parallel Programming - Compile-Time Defined Parallelization and Storage Order for Accelerators and CPUs.
Authors: Michel Müller (Tokyo Institute of Technology)
Abstract: Performance portability between CPU and accelerators is a major challenge for coarse grain parallelized codes. Hybrid Fortran offers a new approach in porting for accelerators that requires minimal code changes and allows to keep the performance of CPU optimized loop structures and storage orders. This is achieved through a compile-time code transformation where the CPU and accelerator cases are treated separately. Results show minimal performance losses compared to the fastest non-portable solution on both CPU and GPU. Using this approach, five applications have been ported to accelerators, showing minimal or no slowdown on CPU while enabling high speedups on GPU.