The International Conference for High Performance Computing, Networking, Storage and Analysis
Employing Machine Learning for the Selection of Robust Algorithms for the Dynamic Scheduling of Scientific Applications.
Authors: Nitin Sukhija (Mississippi State University), Srishti Srivastava (Mississippi State University), Florina M. Ciorba (Technical University Dresden), Ioana Banicescu (Mississippi State University), Brandon Malone (Helsinki Institute for Information Technology)
Abstract: Scheduling scientific applications with large, computationally intensive, and data parallel loops, which have irregular iteration execution times, on heterogeneous computing systems with unpredictably fluctuating load requires highly efficient and robust scheduling algorithms. State-of-the-art dynamic loop scheduling (DLS) techniques provide a solution for achieving the best performance for these applications executing in dynamic computing environments. Selecting the most robust of the state-of-the-art DLS algorithms remains, however, challenging.
In this work we propose a methodology for solving this selection problem. We employ machine learning to obtain an empirical robustness prediction model that enables algorithm selection from a portfolio of DLS algorithms on a per-instance basis. An instance consists of the given application and current system characteristics, including workload conditions. Through discrete event simulations, we show that the proposed portfolio-based approach offers higher performance guarantees with respect to the robust execution of the application when compared to the simpler winner-take-all approach.