Monitoring Large-Scale HPC Systems: Issues and Approaches

Monitoring Large-Scale HPC Systems: Issues and Approaches

Birds of a Feather

5:30PM - 7:00PM

Jim Brandt, Michael Showerman, Michael Mason



This BOF addresses critical issues and approaches in large-scale HPC monitoring from the perspectives of system administrators, users, and vendors. In particular we target capabilities, gaps, and roadblocks in monitoring as we move to extreme scales including: a) desired information, b) vendor and tool-enabled interfaces to data, c) integration of capabilities that provide and respond to data (e.g., integrated adaptive runtimes, application feedback), d) Monitoring impact analysis methods for large scale applications, and e) other hot topics (e.g., power, network congestion, reliability, high-density components). A panel of large-scale HPC stakeholders will interact with BoF attendees on topics of interest.

Session Leader Details:

Jim Brandt (Primary Session Leader) - Sandia National Laboratories

Michael Showerman (Secondary Session Leader) - University of Illinois at Urbana-Champaign

Michael Mason (Secondary Session Leader) - Los Alamos National Laboratory

