The International Conference for High Performance Computing, Networking, Storage and Analysis
Accelerating MPI Collective Communications through Hierarchical Algorithms with Flexible Inter-Node Communication and Imbalance Awareness.
Student: Benjamin Parsons (Purdue University)
Advisor: Vijay Pai (Purdue University)
Abstract: This work investigates collective communication algorithms on a shared memory system, and develops the universal hierarchical algorithm. This algorithm can pair arbitrary hierarchy unaware inter-node communication algorithms with shared memory intra-node communication. In addition to flexible inter-node communication, this algorithm works with all collectives, including those incompatible with past works, like alltoallv. The universal algorithm shows impressive performance results, improving upon the MPICH algorithms as well as the Cray MPT algorithms. Speedups average 15x - 30x for most collectives with improved scalability up to 64k cores.
The second part of this work creates new hierarchical collective algorithms designed to tolerate process imbalance. The process imbalance of benchmarks is thoroughly evaluated, and is used to design collective algorithms that minimize the synchronization delay observed by early arriving processes. Preliminary results for a reduction show speed-ups reaching 47x over a binomial tree algorithm in the presence of high, but not unreasonable, imbalance.