The International Conference for High Performance Computing, Networking, Storage and Analysis
Leveraging Naturally Distributed Data Redundancy to Optimize Collective Replication.
Authors: Bogdan Nicolae (IBM Corporation), Massimiliano Meneghin (IBM Corporation), Pierre Lemarinier (IBM Corporation)
Abstract: Techniques such as replication are often used to enable
resilience and high availability. However, replication introduces
overhead both in terms of network traffic necessary to distribute
replicas, as well as extra storage space requirements. To this end,
redundancy elimination techniques such as compression or deduplication
are often used to reduce the overhead of communication and storage.
This paper aims to explore how these two phases can be optimized by
combining them into a single phase. Our key idea relies on the
observation that since data is related, there is a probability that
distributed redundancy is already naturally present, thus it may pay
off to try to identify this natural redundancy in order to avoid
reducing redundancy unnecessarily in the first phase only to add it
back later in the second phase. We present how this idea can be
leveraged in practice and demonstrate its viability for two real-life
HPC applications.