SC14 New Orleans, LA

The International Conference for High Performance Computing, Networking, Storage and Analysis

Leveraging Naturally Distributed Data Redundancy to Optimize Collective Replication.

Authors: Bogdan Nicolae (IBM Corporation), Massimiliano Meneghin (IBM Corporation), Pierre Lemarinier (IBM Corporation)

Abstract: Techniques such as replication are often used to enable resilience and high availability. However, replication introduces overhead both in terms of network traffic necessary to distribute replicas, as well as extra storage space requirements. To this end, redundancy elimination techniques such as compression or deduplication are often used to reduce the overhead of communication and storage. This paper aims to explore how these two phases can be optimized by combining them into a single phase. Our key idea relies on the observation that since data is related, there is a probability that distributed redundancy is already naturally present, thus it may pay off to try to identify this natural redundancy in order to avoid reducing redundancy unnecessarily in the first phase only to add it back later in the second phase. We present how this idea can be leveraged in practice and demonstrate its viability for two real-life HPC applications.

Poster: pdf
Two-page extended abstract: pdf

Poster Index