Whole exome sequencing and characterization of coding variation in 49,960 individuals in the UK Biobank
Cristopher V. Van Hout,
Joshua D Backman,
Joshua X Hoffman,
Ashutosh K Pandey,
Alexander H Li,
Andrew L Blumenfeld,
William J Salerno,
Wendy K. Chung,
Cristen J. Willer,
Joseph B Leader,
David J Carey,
David H Ledbetter,
Geisinger-Regeneron DiscovEHR Collaboration,
George D Yancopoulos,
Alan R. Shuldiner,
Matthew R. Nelson,
Jeffrey G. Reid,
John D Overton,
Robert A Scott,
on behalf of the Regeneron Genetics Center
Posted 09 Mar 2019
bioRxiv DOI: 10.1101/572347
Posted 09 Mar 2019
The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world. Here we describe the first tranche of large-scale exome sequence data for 49,960 study participants, revealing approximately 4 million coding variants (of which ~98.4% have frequency < 1%). The data includes 231,631 predicted loss of function variants, a >10-fold increase compared to imputed sequence for the same participants. Nearly all genes (>97%) had ≥1 predicted loss of function carrier, and most genes (>69%) had ≥10 loss of function carriers. We illustrate the power of characterizing loss of function variation in this large population through association analyses across 1,741 phenotypes. In addition to replicating a range of established associations, we discover novel loss of function variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical significance in this population, finding that 2% of the population has a medically actionable variant. Additionally, we leverage the phenotypic data to characterize the relationship between rare BRCA1 and BRCA2 pathogenic variants and cancer risk. Exomes from the first 49,960 participants are now made accessible to the scientific community and highlight the promise offered by genomic sequencing in large-scale population-based studies.
- Downloaded 11,882 times
- Download rankings, all-time:
- Site-wide: 206 out of 100,745
- In genomics: 38 out of 6,246
- Year to date:
- Site-wide: 471 out of 100,745
- Since beginning of last month:
- Site-wide: None out of 100,745
Downloads over time
Distribution of downloads per paper, site-wide
- 20 Oct 2020: Support for sorting preprints using Twitter activity has been removed, at least temporarily, until a new source of social media activity data becomes available.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!