SC14 New Orleans, LA

The International Conference for High Performance Computing, Networking, Storage and Analysis

Supervised Learning for Parallel Application Performance Prediction.


Student: Andrew Titus (Massachusetts Institute of Technology)
Supervisor: Abhinav Bhatele (Lawrence Livermore National Laboratory)

Abstract: Communication is a scaling bottleneck for many parallel applications executed on large machines. Intelligent task mapping may alleviate the negative impact of communication, but simple metrics used to find such mappings may not be good predictors of their performance. We evaluate supervised machine learning methods as tools for prediction of communication time of large parallel applications. Through these methods, we correlate communication time for different task mappings to the corresponding network hardware counters accessible on the IBM Blue Gene/Q. The results from these machine learning regression algorithms are used to provide insight into the relative importance of different hardware counters and metrics for predicting application performance. These results are explored graphically in the poster in the context of two production applications, MILC and pF3D.

Poster: pdf
Two-page extended abstract: pdf


Poster Index