The International Conference for High Performance Computing, Networking, Storage and Analysis
Lightweight Scheduling for Improving Load Balance Without Losing Locality.
Authors: Vivek Kale (University of Illinois), William Gropp (University of Illinois at Urbana-Champaign), Simplice Donfack (French Institute for Research in Computer Science and Automation)
Abstract: Performance irregularities on massively parallel processors lead to load imbalances and a significant loss of performance. Multi-core nodes suggest a promising way to re-distribute work within a node, thus mitigating performance irregularities. However, there exists a non-trivial cost to redistributing work, and associated data, across cores. We investigate how work can be equitably distributed across cores without significantly disturbing data locality, and without incurring significant scheduling overhead. Towards this end, we design a series of scheduling strategies and tuning mechanisms; our foundational technique is intelligent blending of static and dynamic scheduling. We also implement a basic runtime system and library to minimize programmer effort in applying these strategies. Our techniques provide 28.16% performance gains over static scheduling and 17.13% gains over guided scheduling for a widely used regular mesh benchmark, and 44.45% gains over static scheduling and 13.06% gains over guided scheduling for an n-body simulation, both on 1024 nodes.