sponsored byIEEEACMThe International Conference for High Performance 
Computing, Networking, Storage and Analysis
SCHEDULE: NOV 16-21, 2014

Lattice QCD with Domain Decomposition on Intel(R) Xeon Phi(TM) Co-Processors

SESSION: Heterogeneity and Scaling in Applications


TIME: 10:30AM - 11:00AM

SESSION CHAIR: Justin Luitjens

The gap between the cost of moving data and the cost of computing
continues to grow, making it ever harder to design iterative solvers on
extreme-scale architectures. This problem can be alleviated by
alternative algorithms that reduce the amount of data movement. We
investigate this in the context of Lattice Quantum Chromodynamics
and implement such an alternative solver algorithm, based on domain
decomposition, on Intel(R) Xeon Phi(TM) co-processor (KNC) clusters. We demonstrate close-to-linear on-chip scaling to all 60 cores of the KNC. With a mix of single- and half-precision the domain-decomposition method sustains 400-500 Gflop/s per chip. Compared to an optimized KNC implementation of a
standard solver [1], our full multi-node domain-decomposition solver
strong-scales to more nodes and reduces the time-to-solution by a factor of 5.

Chair/Author Details:

Justin Luitjens (Chair) - NVIDIA Corporation

Simon Heybrock - University of Regensburg

Balint Joo - Thomas Jefferson National Accelerator Facility

Dhiraj D. Kalamkar - Intel Corporation

Mikhail Smelyanskiy - Intel Corporation

Karthikeyan Vaidyanathan - Intel Corporation

Tilo Wettig - University of Regensburg

Pradeep Dubey - Intel Corporation

Paper provided by the ACM Digital Library

Paper also available from IEEE Computer Society