Towards a Comprehensive Variation Benchmark for Challenging Medically-Relevant Autosomal Genes
Nathan D. Olson,
Aaron M. Wenger,
William J. Rowell,
Ziad M Khan,
Sayed Mohammad Ebrahim Sahraeian,
Danny E Miller,
Jose M. Lorenzo-Salazar,
Luis A. Rubio-Rodriguez,
Uday Shanker Evani,
Wayne E. Clarke,
Christopher E Mason,
Stephen E Lincoln,
Karen H Miga,
Mark TW Ebbert,
Justin M. Zook,
Fritz J. Sedlazeck
Posted 07 Jun 2021
bioRxiv DOI: 10.1101/2021.06.07.444885
Posted 07 Jun 2021
The repetitive nature and complexity of multiple medically important genes make them intractable to accurate analysis, despite the maturity of short-read sequencing, resulting in a gap in clinical applications of genome sequencing. The Genome in a Bottle Consortium has provided benchmark variant sets, but these excluded some medically relevant genes due to their repetitiveness or polymorphic complexity. In this study, we characterize 273 of these 395 challenging autosomal genes that have multiple implications for medical sequencing. This extended, curated benchmark freports over 17,000 SNVs, 3,600 INDELs, and 200 SVs each for GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically important genes including CBS, CRYAA, and KCNE1. Our proposed solution improves variant recall in these genes from 8% to 100%. This benchmark will significantly improve the comprehensive characterization of these medically relevant genes and guide new method development.
- Downloaded 906 times
- Download rankings, all-time:
- Site-wide: 27,504
- In genomics: 2,512
- Year to date:
- Site-wide: 3,235
- Since beginning of last month:
- Site-wide: 447
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!