Rxivist logo

Advanced Whole Genome Sequencing Using an Entirely PCR-free Massively Parallel Sequencing Workflow

By Zhao Xia, Yuan Jiang, Radoje Drmanac, Hanjie Shen, Pengjuan Liu, Zhanqing Li, Fang Chen, Hui Jiang, Shiming Shi, Yang Xi, Qiaoling Li, Xiaojue Wang, Jing Zhao, Xinming Liang, Yinlong Xie, Lin Wang, Wenlan Tian, Tam Berntsen, Yinling Luo, Meihua Gong, Jiguang Li, Chongjun Xu, Sijie Dai, Zilan Mi, Han Ren, Zhe Lin, Ao Chen, Wenwei Zhang, Feng Mu, Xun Xu

Posted 23 Dec 2019
bioRxiv DOI: 10.1101/2019.12.20.885517

Background: Systematic errors can be introduced from DNA amplification during massively parallel sequencing (MPS) library preparation and sequencing array formation. Polymerase chain reaction (PCR)-free genomic library preparation methods were previously shown to improve whole genome sequencing (WGS) quality on the Illumina platform, especially in calling insertions and deletions (InDels). We hypothesized that substantial InDel errors continue to be introduced by the remaining PCR step of DNA cluster generation. In addition to library preparation and sequencing, data analysis methods are also important for the accuracy of the output data. In recent years, several machine learning variant calling pipelines have emerged, which can correct the systematic errors from MPS and improve the data performance of variant calling. Results: Here, PCR-free libraries were sequenced on the PCR-free DNBSEQTM arrays from MGI Tech Co., Ltd. (referred to as MGI) to accomplish the first true PCR-free WGS which the whole process is truly not only PCR-free during library preparation but also PCR-free during sequencing. We demonstrated that PCR-based WGS libraries have significantly (about 5 times) more InDel errors than PCR-free libraries.Furthermore, PCR-free WGS libraries sequenced on the PCR-free DNBSEQTM platform have up to 55% less InDel errors compared to the NovaSeq platform, confirming that DNA clusters contain PCR-generated errors.In addition, low coverage bias and less than 1% read duplication rate was reproducibly obtained in DNBSEQTM PCR-free using either ultrasonic or enzymatic DNA fragmentation MGI kits combined with MGISEQ-2000. Meanwhile, variant calling performance (single-nucleotide polymorphisms (SNPs) F-score>99.94%, InDels F-score>99.6%) exceeded widely accepted standards using machine learning (ML) methods (DeepVariant or DNAscope). Conclusions: Enabled by the new PCR-free library preparation kits, ultra high-thoughput PCR-free sequencers and ML-based variant calling, true PCR-free DNBSEQTM WGS provides a powerful solution for improving WGS accuracy while reducing cost and analysis time, thus facilitating future precision medicine, cohort studies, and large population genome projects. Keywords: WGS, PCR-free, DNBSEQTM, InDel errors, Machine learning-based variant calling ### Competing Interest Statement The authors have no competing interests´╝îbesides some employees of MGI Tech Co., Ltd., BGI-Shenzhen and Complete Genomics Inc. have stock holdings in BGI.

Download data

  • Downloaded 2,311 times
  • Download rankings, all-time:
    • Site-wide: 6,603
    • In genomics: 718
  • Year to date:
    • Site-wide: 4,461
  • Since beginning of last month:
    • Site-wide: 6,849

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide