The International Conference for High Performance Computing, Networking, Storage and Analysis
Optimizing Stencil Computations: Multicore-Optimized Wavefront Diamond Blocking on Shared and Distributed Memory Systems.
Authors: Tareq Malas (King Abdullah University of Science and Technology), Georg Hager (Erlangen Regional Computing Center), Hatem Ltaief (King Abdullah University of Science and Technology), Holger Stengel (University of Erlangen-Nuremberg), Gerhard Wellein (University of Erlangen-Nuremberg), David Keyes (King Abdullah University of Science and Technology)
Abstract: The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to accelerate stencil updates and approach theoretical peak performance. A key ingredient is the reduction of data traffic across slow data paths, especially the main memory interface. In this work we combine the ideas of multi-core wavefront temporal blocking and diamond tiling to arrive at stencil update schemes that show large reductions in memory pressure compared to existing approaches. The resulting schemes show performance advantages in bandwidth-starved situations, which are exacerbated by the high bytes per lattice update case of variable coefficients. We present performance results on a contemporary Intel processor.