sponsored byIEEEACMThe International Conference for High Performance 
Computing, Networking, Storage and Analysis
SCHEDULE: NOV 16-21, 2014

Orion : Scaling Genomic Sequence Matching with Fine-Grained Parallelization

SESSION: High Performance Genomics


TIME: 11:00AM - 11:30AM


AUTHOR(S):Kanak Mahadik, Somali Chaterji, Bowen Zhou, Milind Kulkarni, Saurabh Bagchi



Gene sequencing instruments are producing huge volumes of data, straining the capabilities of current database searching algorithms and hindering efforts of researchers analyzing large collections of data to obtain greater insights. In the
space of parallel genomic sequence search, most of the popular software packages, like mpiBLAST, use the database segmentation approach, wherein the entire database is sharded and searched on different nodes. However this approach does not scale well with the increasing length of individual query sequences as well as the rapid growth in size of sequence databases. In this paper, we
propose a fine-grained parallelism technique, called Orion, that divides the input query into an adaptive number of fragments and shards the database. Our technique achieves higher parallelism (and hence speedup) and load balancing than database sharding alone, while maintaining 100% accuracy. We show that it is 12.3X faster than mpiBLAST for solving a relevant comparative genomics problem.

Chair/Author Details:

Zhong Jin (Chair) - Chinese Academy of Sciences

Kanak Mahadik - Purdue University

Somali Chaterji - Purdue University

Bowen Zhou - Purdue University

Milind Kulkarni - Purdue University

Saurabh Bagchi - Purdue University

