SCHEDULE: NOV 16-21, 2014

Efficient Shared-Memory Implementation of High-Performance Conjugate Gradient Benchmark and Its Application to Unstructured Matrices

SESSION: Sparse Solvers


TIME: 2:30PM - 3:00PM


AUTHOR(S):Jongsoo Park, Mikhail Smelyanskiy, Karthikeyan Vaidyanathan, Alexander Heinecke, Dhiraj D. Kalamkar, Xing Liu, Md. Mostofa Ali Patwary, Yutong Lu, Pradeep Dubey



High-performance sparse linear solvers, the back-bone of modern HPC, face many challenges on upcoming extreme-scale architectures. The High Performance Linpack (HPL), widely recognized benchmark for ranking such system, does not represent challenges inherent to these solvers. To address this shortcoming, a new sparse high performance conjugate gradient benchmark (HPCG) has been recently proposed. This is the first paper which analyzes and optimizes HPCG on two modern multi- and many-core IA-based architectures: Xeon and Xeon Phi. We explore number of algorithmic and performance optimizations. By taking advantage of salient architectural features of these two architectures, our implementation sustains 75% and 67% of their achievable bandwidth, respectively. We further show our optimizations generally apply to a wide range of matrices, on which we achieve 72% and 65% of achievable bandwidth. Lastly, we study multi-node scalability of HPCG and the tradeoff between number of parallel domains, convergence and single-node parallel performance.

Chair/Author Details:

Anne C. Elster (Chair) - Norwegian University of Science & Technology / University of Texas at Austin

Jongsoo Park - Intel Corporation

Mikhail Smelyanskiy - Intel Corporation

Karthikeyan Vaidyanathan - Intel Corporation

Alexander Heinecke - Intel Corporation

Dhiraj D. Kalamkar - Intel Corporation

Xing Liu - Georgia Institute of Technology

Md. Mostofa Ali Patwary - Intel Corporation

Yutong Lu - National University of Defense Technology, China

Pradeep Dubey - Intel Corporation

