The International Conference for High Performance Computing, Networking, Storage and Analysis
Supervised Learning for Parallel Application Performance Prediction.
Student: Andrew Titus (Massachusetts Institute of Technology)
Supervisor: Abhinav Bhatele (Lawrence Livermore National Laboratory)
Abstract: Communication is a scaling bottleneck for many parallel applications executed
on large machines. Intelligent task mapping may alleviate the negative impact
of communication, but simple metrics used to find such mappings may not be good predictors of their performance. We evaluate supervised machine learning
methods as tools for prediction of communication time of large parallel
applications. Through these methods, we correlate communication time for
different task mappings to the corresponding network hardware counters
accessible on the IBM Blue Gene/Q. The results from these machine
learning regression algorithms are used to provide insight into the relative importance of different hardware counters and metrics for predicting application
performance. These results are explored graphically in the poster in the
context of two production applications, MILC and pF3D.