Rxivist logo

The repetitive nature and complexity of multiple medically important genes make them intractable to accurate analysis, despite the maturity of short-read sequencing, resulting in a gap in clinical applications of genome sequencing. The Genome in a Bottle Consortium has provided benchmark variant sets, but these excluded some medically relevant genes due to their repetitiveness or polymorphic complexity. In this study, we characterize 273 of these 395 challenging autosomal genes that have multiple implications for medical sequencing. This extended, curated benchmark freports over 17,000 SNVs, 3,600 INDELs, and 200 SVs each for GRCh37 and GRCh38 across HG002. We show that false duplications in either GRCh37 or GRCh38 result in reference-specific, missed variants for short- and long-read technologies in medically important genes including CBS, CRYAA, and KCNE1. Our proposed solution improves variant recall in these genes from 8% to 100%. This benchmark will significantly improve the comprehensive characterization of these medically relevant genes and guide new method development.

Download data

  • Downloaded 906 times
  • Download rankings, all-time:
    • Site-wide: 27,504
    • In genomics: 2,512
  • Year to date:
    • Site-wide: 3,235
  • Since beginning of last month:
    • Site-wide: 447

Altmetric data

Distribution of downloads per paper, site-wide