SCHEDULE: NOV 16-21, 2014

Parallel De Bruijn Graph Construction and Traversal for De Novo Genome Assembly

SESSION: High Performance Genomics


TIME: 10:30AM - 11:00AM


AUTHOR(S):Evangelos Georganas, Aydin Buluc, Jarrod Chapman, Leonid Oliker, Daniel Rokhsar, Katherine Yelick



De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous fragments called reads. We study optimized parallelization of the most time-consuming phases of Meraculous, a state-of-the-art production assembler. First, we present a new parallel algorithm for k-mer analysis, characterized by intensive communication and I/O requirements, and reduce the memory requirements by 6.93x. Second, we efficiently parallelize de Bruijn graph construction and traversal, which necessitates a distributed hash table and is a key component of most de novo assemblers. We provide a novel algorithm that leverages one-sided communication capabilities of the Unified Parallel C (UPC) to facilitate the requisite fine-grained parallelism and avoidance of data hazards, while analytically proving its scalability properties. Overall results show unprecedented performance and efficient scaling on up to 15360 cores of the Cray XC30, on human genome as well as the challenging wheat genome, with performance improvement from days to seconds.

Chair/Author Details:

Zhong Jin (Chair) - Chinese Academy of Sciences

Evangelos Georganas - University of California, Berkeley

Aydin Buluc - Lawrence Berkeley National Laboratory

Jarrod Chapman - Joint Genome Institute

Leonid Oliker - Lawrence Berkeley National Laboratory

Daniel Rokhsar - Joint Genome Institute

Katherine Yelick - Lawrence Berkeley National Laboratory

