SCHEDULE: NOV 16-21, 2014

ECC Parity: A Technique for Efficient Memory Error Resilience for Multi-Channel Memory Systems

SESSION: Memory System Energy Efficiency


TIME: 3:30PM - 4:00PM


AUTHOR(S):Xun Jian, Rakesh Kumar



Servers and HPC systems often use a strong memory error correction code, or ECC, to meet their reliability and availability requirements. However, these ECCs often require significant capacity and/or power overheads. We observe that since memory channels are independent from one another, error correction only needs to be performed for one channel at a time. Based on this observation, we show that instead of always storing in memory the actual ECC correction bits as do existing systems, it is sufficient to store the bitwise parity of the ECC correction bits of different channels for fault-free memory regions, and store the actual ECC correction bits only for faulty regions. By trading off the resultant ECC capacity overhead reduction for improved memory energy efficiency, the proposed technique reduces memory energy per instruction by 54.4% and 18.5%, respectively, compared to commercial chipkill correct and DIMM-kill correct, while incurring similar or lower capacity overheads.

Chair/Author Details:

Alex Ramirez (Chair) - Polytechnic University of Catalonia

Xun Jian - University of Illinois at Urbana-Champaign

Rakesh Kumar - University of Illinois at Urbana-Champaign

