sponsored byIEEEACMThe International Conference for High Performance 
Computing, Networking, Storage and Analysis
FacebookTwitterGoogle PlusLinkedInYouTubeFlickr

SCHEDULE: NOV 16-21, 2014

When viewing the Technical Program schedule, on the far righthand side is a column labeled "PLANNER." Use this planner to build your own schedule. Once you select an event and want to add it to your personal schedule, just click on the calendar icon of your choice (outlook calendar, ical calendar or google calendar) and that event will be stored there. As you select events in this manner, you will have your own schedule to guide you through the week.

Lightweight Distributed Metric Service: A Scalable Infrastructure for Continuous Monitoring of Large Scale Computing Systems and Applications

SESSION: Performance Measurement

EVENT TYPE: Papers

TIME: 11:00AM - 11:30AM

SESSION CHAIR: Shirley Moore

AUTHOR(S):Anthony Agelastos, Benjamin Allan, Jim Brandt, Paul Cassella, Jeremy Enos, Joshi Fullop, Ann Gentile, Steve Monk, Nichamon Naksinehaboon, Jeff Ogden, Mahesh Rajan, Michael Showerman, Joel Stevenson, Narate Taerat, Tom Tucker

ROOM:388-89-90

ABSTRACT:

Understanding how resources of High Performance Compute platforms are utilized by applications both individually and as a composite is key to application and platform performance. Typical system monitoring tools do not provide sufficient fidelity while application profiling tools do not capture the complex interplay between applications competing for shared resources. To gain new insights, monitoring tools must run continuously, system wide, at frequencies appropriate to the metrics of interest while having minimal impact on application performance.

We introduce the Lightweight Distributed Metric Service for scalable, lightweight monitoring of large scale computing systems and applications. We describe issues and constraints guiding deployment in Sandia National Laboratories' capacity computing environment and on the National Center for Supercomputing Applications' Blue Waters platform including motivations, metrics of choice, and requirements relating to the scale and specialized nature of Blue Waters. We address monitoring overhead and impact on application performance and provide illustrative profiling results.

Chair/Author Details:

Shirley Moore (Chair) - University of Texas at El Paso

Anthony Agelastos - Sandia National Laboratories

Benjamin Allan - Sandia National Laboratories

Jim Brandt - Sandia National Laboratories

Paul Cassella - Cray Inc.

Jeremy Enos - University of Illinois at Urbana-Champaign

Joshi Fullop - University of Illinois at Urbana-Champaign

Ann Gentile - Sandia National Laboratories

Steve Monk - Sandia National Laboratories

Nichamon Naksinehaboon - Open Grid Computing

Jeff Ogden - Sandia National Laboratories

Mahesh Rajan - Sandia National Laboratories

Michael Showerman - University of Illinois at Urbana-Champaign

Joel Stevenson - Sandia National Laboratories

Narate Taerat - Open Grid Computing

Tom Tucker - Open Grid Computing

Add to iCal  Click here to download .ics calendar file

Add to Outlook  Click here to download .vcs calendar file

Add to Google Calendarss  Click here to add event to your Google Calendar


Paper provided by the ACM Digital Library

Paper also available from IEEE Computer Society