The International Conference for High Performance Computing, Networking, Storage and Analysis
Massively Parallel and Near Linear Time Graph Analytics.
Authors: Fazle Elahi Faisal (IBM Corporation), Yves Ineichen (IBM Corporation), A. Cristiano I. Malossi (IBM Corporation), Peter Staar (IBM Corporation), Costas Bekas (IBM Corporation), Alessandro Curioni (IBM Corporation)
Abstract: Graph theory and graph analytics are essential in many research problems and have applications in various domains. The rapid growth in the data can lead to very large sparse graphs consisting of billions of nodes. This poses many challenges in designing fast algorithms for large-scale graph analytics.
We believe that in the era of the big data regime accuracy has to be traded for algorithmic complexity and scalability to drive big data analytics. We discuss the underlaying mathematical models of two novel near linear (O(N)) methods for graph analytics: estimating subgraph centralities, and spectrograms. Parallelism on multiple levels is used to attain good scalability. We benchmarked the proposed algorithms on huge datasets, e.g. the European street network containing 51 million nodes and 108 million non-zeros, on up to 32,768 cores. Due to the iterative behavior of the methods, a very good estimate can be computed in a matter of seconds.