David Jaffe’s team develops genome assembly methods at the Broad Institute. Their goal is to couple optimal laboratory and computational methods to yield the best possible genome assemblies. They developed the widely used algorithms ARACHNE, ALLPATHS-LG, and now DISCOVAR. DISCOVAR starts from a low-cost data type, 250 base reads from a single PCR-free library. From these data the algorithm constructs an assembly graph that has long-range contiguity (100 kb for human genomes) and fully captures polymorphism. Direct visualization of these graphs reveals important features including structural variation, in both normal and cancer samples.
° Batzoglou S et al. (2002) ARACHNE: A whole-genome shotgun assembler. Genome Res 12: 177–189.
° Gnerre S et al. (2011) High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci USA 108: 1513-8.
° Aird D et al. (2011) Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries. Genome Biol 12: R18.
° Calvo SE et al. (2012) Molecular diagnosis of infantile mitochondrial disease with targeted next-generation sequencing. Sci Transl Med 25: 118ra10.
° Kirby A et al. (2013) Mutations causing medullary cystic kidney disease type 1 lie in a large VNTR in MUC1 missed by massively parallel sequencing. Nat Genet 45: 299-303.