SCHEDULE: NOV 16-21, 2014

High Performance Data Analytics: Experiences Porting the Apache Hama Graph Analytics Framework to an HPC Infiniband Connected Cluster

SESSION: Software for HPC

EVENT TYPE: Exhibitor Forums

TIME: 2:30PM - 3:00PM

SESSION CHAIR: Paul Domagala

Presenter(s):William Leinberger



Open source analytic frameworks provide access to Big Data in a productive and fault resilient way on scale-out commodity hardware systems. The objectives of High Performance Data Analytic systems are to maintain the framework productivity and improve performance for the data analyst. In order to achieve the performance available from High Performance Computing (HPC) technology, a framework must be recast from the distributed programming model common in the open source world to the parallel programming model used successfully in the HPC world by surgical replacement of key framework functions that leverage the strengths of HPC systems. We demonstrate this by porting the Apache Hama graph analytic framework to an HPC Infiniband Cluster. By replacing the distributed barrier class in the framework with a parallel HPC variant (prototyped in MPI), we achieved a performance increase of 37% on a real-world Community Detection application applied to a synthetic community rich graph.

Chair/Presenter Details:

Paul Domagala (Chair) - Argonne National Laboratory

William Leinberger - General Dynamics

