SCHEDULE: NOV 16-21, 2014

Multi-Level Hashing Dedup in HPC Storage Systems

SESSION: ACM Student Research Competition Poster Reception

EVENT TYPE: ACM Student Research Competition

TIME: 5:15PM - 7:00PM

AUTHOR(S):Eric E. Valenzuela

ROOM:New Orleans Theater Lobby


Reaching high ratios of data deduplication in High Performance Computing (HPC) is highly achievable. Prior art demonstrates magnitudes of reduction possible and 15 to 30 percent of redundant data can be removed on average using deduplication techniques. The objective of this research study is to design and experiment a dedup system to provide 100% data integrity without a possibility of losing data while reducing the need of costly byte-by-byte comparisons. Because data deduplication uses hashing algorithms, hash collisions will occur. Prior systems ignore byte-by-byte comparisons that are needed to handle collisions citing the probability is low. Our research focuses on investigating a multi-level dedup method to reduce byte-by-byte comparisons while providing 100% data integrity, and the implementation of multi-level hash functions while talking advantage of Xeon Phi many-core architecture to compute cryptographic fingerprints concurrently. Our current proof-of-concept evaluations with a deduplication file system, Lessfs, show promising results.

Chair/Author Details:

Eric E. Valenzuela - Texas Tech University

