A robust benchmark for germline structural variant detection
By
Justin M. Zook,
Nancy F Hansen,
Nathan D. Olson,
Lesley M Chapman,
James C. Mullikin,
Chunlin Xiao,
Stephen Sherry,
Sergey Koren,
Adam M. Phillippy,
Paul C. Boutros,
Sayed Mohammad E. Sahraeian,
Vincent Huang,
Alexandre Rouette,
Noah Alexander,
Christopher Mason,
Iman Hajirasouliha,
Camir Ricketts,
Joyce Lee,
Rick Tearle,
Ian T. Fiddes,
Alvaro Martinez Barrio,
Jeremiah Wala,
Andrew Carroll,
Noushin Ghaffari,
Oscar L. Rodriguez,
Ali Bashir,
Shaun D Jackman(0000-0002-9275-5966),
John J Farrell,
Aaron M. Wenger,
Can Alkan,
Arda Soylev,
Michael C. Schatz,
Shilpa Garg,
George Church,
Tobias Marschall,
Ken Chen,
Xian Fan,
Adam C English,
Jeffrey A. Rosenfeld,
Weichen Zhou,
Ryan E. Mills,
Jay M. Sage,
Jennifer R. Davis,
Michael D. Kaiser,
John S. Oliver,
Anthony P Catalano,
Mark Chaisson,
Noah Spies,
Fritz J. Sedlazeck,
Marc Salit,
the Genome in a Bottle Consortium
Posted 09 Jun 2019
bioRxiv DOI: 10.1101/664623
New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution, and comprehensiveness. Translating these methods to routine research and clinical practice requires robust benchmark sets. We developed the first benchmark set for identification of both false negative and false positive germline SVs, which complements recent efforts emphasizing increasingly comprehensive characterization of SVs. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle (GIAB) Consortium integrated 19 sequence-resolved variant calling methods, both alignment- and de novo assembly-based, from short-, linked-, and long-read sequencing, as well as optical and electronic mapping. The final benchmark set contains 12745 isolated, sequence-resolved insertion and deletion calls ≥50 base pairs (bp) discovered by at least 2 technologies or 5 callsets, genotyped as heterozygous or homozygous variants by long reads. The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.66 Gbp and 9641 SVs supported by at least one diploid assembly. Support for SVs was assessed using svviz with short-, linked-, and long-read sequence data. In general, there was strong support from multiple technologies for the benchmark SVs, with 90 % of the Tier 1 SVs having support in reads from more than one technology. The Mendelian genotype error rate was 0.3 %, and genotype concordance with manual curation was >98.7 %. We demonstrate the utility of the benchmark set by showing it reliably identifies both false negatives and false positives in high-quality SV callsets from short-, linked-, and long-read sequencing and optical mapping.
Download data
- Downloaded 5,382 times
- Download rankings, all-time:
- Site-wide: 1,540
- In genomics: 157
- Year to date:
- Site-wide: None
- Since beginning of last month:
- Site-wide: 25,388
Altmetric data
Downloads over time
Distribution of downloads per paper, site-wide
PanLingua
News
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!