The International Conference for High Performance Computing, Networking, Storage and Analysis
Tightly Coupled Accelerators Architecture for Low-latency Inter-Node Communication Between Accelerators.
Authors: Toshihiro Hanawa (University of Tokyo), Yuetsu Kodama (University of Tsukuba), Taisuke Boku (University of Tsukuba), Mitsuhisa Sato (University of Tsukuba)
Abstract: Inter-node communications between accelerators in heterogeneous clusters require extra latency due to the data copy between host and accelerator, and such communication latency causes severe performance degradation on applications. Especially in the next generation’s HPC systems, the strong scaling will be more serious issue than today, and the communication latency becomes the critical issue. To address this problem, we proposed the Tightly Coupled Accelerators (TCA) architecture, and designed the interconnection router chip named PEACH2. Accelerators in the TCA architecture communicate directly via the PCIe protocol to eliminate protocol overhead, as well as the data copy overhead. In this paper, we present HA- PACS/TCA system, the proof-of-concept GPU cluster based on the TCA architecture. Our system demonstrates 2.3 μsec of the latency on the inter-node GPU-to-GPU communication. As the result of Himeno benchmark, we demonstrated that TCA improves the scalability of the performance in the small size with up to 65%.