Advanced Whole Genome Sequencing Using an Entirely PCR-free Massively Parallel Sequencing Workflow
Posted 23 Dec 2019
bioRxiv DOI: 10.1101/2019.12.20.885517
Posted 23 Dec 2019
Background: Systematic errors can be introduced from DNA amplification during massively parallel sequencing (MPS) library preparation and sequencing array formation. Polymerase chain reaction (PCR)-free genomic library preparation methods were previously shown to improve whole genome sequencing (WGS) quality on the Illumina platform, especially in calling insertions and deletions (InDels). We hypothesized that substantial InDel errors continue to be introduced by the remaining PCR step of DNA cluster generation. In addition to library preparation and sequencing, data analysis methods are also important for the accuracy of the output data. In recent years, several machine learning variant calling pipelines have emerged, which can correct the systematic errors from MPS and improve the data performance of variant calling. Results: Here, PCR-free libraries were sequenced on the PCR-free DNBSEQTM arrays from MGI Tech Co., Ltd. (referred to as MGI) to accomplish the first true PCR-free WGS which the whole process is truly not only PCR-free during library preparation but also PCR-free during sequencing. We demonstrated that PCR-based WGS libraries have significantly (about 5 times) more InDel errors than PCR-free libraries.Furthermore, PCR-free WGS libraries sequenced on the PCR-free DNBSEQTM platform have up to 55% less InDel errors compared to the NovaSeq platform, confirming that DNA clusters contain PCR-generated errors.In addition, low coverage bias and less than 1% read duplication rate was reproducibly obtained in DNBSEQTM PCR-free using either ultrasonic or enzymatic DNA fragmentation MGI kits combined with MGISEQ-2000. Meanwhile, variant calling performance (single-nucleotide polymorphisms (SNPs) F-score>99.94%, InDels F-score>99.6%) exceeded widely accepted standards using machine learning (ML) methods (DeepVariant or DNAscope). Conclusions: Enabled by the new PCR-free library preparation kits, ultra high-thoughput PCR-free sequencers and ML-based variant calling, true PCR-free DNBSEQTM WGS provides a powerful solution for improving WGS accuracy while reducing cost and analysis time, thus facilitating future precision medicine, cohort studies, and large population genome projects. Keywords: WGS, PCR-free, DNBSEQTM, InDel errors, Machine learning-based variant calling ### Competing Interest Statement The authors have no competing interests，besides some employees of MGI Tech Co., Ltd., BGI-Shenzhen and Complete Genomics Inc. have stock holdings in BGI.
- Downloaded 2,311 times
- Download rankings, all-time:
- Site-wide: 6,603
- In genomics: 718
- Year to date:
- Site-wide: 4,461
- Since beginning of last month:
- Site-wide: 6,849
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!