Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 73,536 bioRxiv papers from 320,015 authors.

Improved assembly and variant detection of a haploid human genome using single-molecule, high-fidelity long reads

By Mitchell R. Vollger, Glennis A. Logsdon, Peter Audano, Arvis Sulovari, David Porubsky, Paul Peluso, Aaron M. Wenger, Gregory T. Concepcion, Zev N. Kronenberg, Katherine M. Munson, Carl Baker, Ashley D. Sanders, Diana C.J. Spierings, Peter M. Lansdorp, Urvashi Surti, Michael W Hunkapiller, Evan E. Eichler

Posted 10 May 2019
bioRxiv DOI: 10.1101/635037 (published DOI: 10.1111/ahg.12364)

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective stand-alone technology for de novo assembly of human genomes.

Download data

  • Downloaded 2,319 times
  • Download rankings, all-time:
    • Site-wide: 2,107 out of 73,536
    • In genomics: 445 out of 4,879
  • Year to date:
    • Site-wide: 1,630 out of 73,536
  • Since beginning of last month:
    • Site-wide: 1,630 out of 73,536

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News