Rxivist logo

De novo diploid genome assembly for genome-wide structural variant detection

By Lu Zhang, Xin Zhou, Ziming Weng, Arend Sidow

Posted 16 Feb 2019
bioRxiv DOI: 10.1101/552430 (published DOI: 10.1093/nargab/lqz018)

Structural variants (SVs) in a personal genome are important but, for all practical purposes, impossible to detect comprehensively by standard short-fragment sequencing. De novo assembly, traditionally used to generate reference genomes, offers an alternative means for variant detection and phasing but has not been applied broadly to human genomes because of fundamental limitations of short-fragment approaches and high cost of long-read technologies. We here show that 10x linked-read sequencing, which has been applied to assemble human diploid genomes into high quality contigs, supports accurate SV detection. We examined variants in six de novo 10x assemblies with diverse experimental parameters from two commonly used human cell lines, NA12878 and NA24385. The assemblies are effective in detecting mid-size SVs, which were discovered by simple pairwise alignment of the assemblies' contigs to the reference (hg38). Our study also shows that the accuracy of SV breakpoint at base-pair level is high, with a majority (80% for deletion and 70% for insertion) of SVs having precisely correct sizes and breakpoints (<2bp difference). Finally, setting the ancestral state of SV loci by comparing to ape orthologs allows inference of the actual molecular mechanism (insertion or deletion) causing the mutation, which in about half of cases is opposite to that of the reference-based call. Interestingly, we uncover 214 SVs that may have been maintained as polymorphisms in the human lineage since before our divergence from chimp. Overall, we show that de novo assembly of 10x linked-read data can achieve cost-effective SV detection for personal genomes.

Download data

  • Downloaded 951 times
  • Download rankings, all-time:
    • Site-wide: 25,587
    • In genomics: 2,381
  • Year to date:
    • Site-wide: 134,880
  • Since beginning of last month:
    • Site-wide: 115,331

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

News