Building Scalable Data Management and Analysis Infrastructure for Metagenomics

AUTHOR(S):Wei Tang, Jared Wilkening, Jared Bischof, Wolfgang Gerlach, Andreas Wilke, Narayan Desai, Folker Meyer

Next-generation sequencing technology has reduced the cost of DNA sequencing dramatically and shifted the bottleneck of metagenomics from data generation to data analysis. For example, MG-RAST, a free open-public metagenome annotation system, has been experiencing an increasingly large amount of data being submitted for analysis---a situation that threatens to overwhelm efficient production. To address this situation, we developed a pair of open-source software products: a data management system named Shock and a workflow management system named AWE. Shock and AWE can be used to build scalable infrastructure for biological sequence data management and analysis.

Wei Tang - Argonne National Laboratory

Jared Wilkening - Argonne National Laboratory

Jared Bischof - Argonne National Laboratory

Wolfgang Gerlach - Argonne National Laboratory

Andreas Wilke - Argonne National Laboratory

Narayan Desai - Argonne National Laboratory

Folker Meyer - Argonne National Laboratory

