Profiling and leveraging relatedness in a precision medicine cohort of 92,455 exomes
Evan K. Maxwell,
Alexander E Lopez,
Cristopher Van Hout,
Tanya M Teslovich,
Shane E McCarthy,
H Lester Kirchner,
Joseph B Leader,
Michael F Murray,
David H Ledbetter,
Alan R Shuldiner,
Frederick E Dewey,
David J Carey,
John D Overton,
Jeffrey G. Reid
Posted 03 Oct 2017
bioRxiv DOI: 10.1101/197889 (published DOI: 10.1016/j.ajhg.2018.03.012)
Posted 03 Oct 2017
Large-scale human genetics studies are ascertaining increasing proportions of populations as they continue growing in both number and scale. As a result, the amount of cryptic relatedness within these study cohorts is growing rapidly and has significant implications on downstream analyses. We demonstrate this growth empirically among the first 92,455 exomes from the DiscovEHR cohort and, via a custom simulation framework we developed called SimProgeny, show that these measures are in-line with expectations given the underlying population and ascertainment approach. For example, we identified ~66,000 close (first- and second-degree) relationships within DiscovEHR involving 55.6% of study participants. Our simulation results project that >70% of the cohort will be involved in these close relationships as DiscovEHR scales to 250,000 recruited individuals. We reconstructed 12,574 pedigrees using these relationships (including 2,192 nuclear families) and leveraged them for multiple applications. The pedigrees substantially improved the phasing accuracy of 20,947 rare, deleterious compound heterozygous mutations. Reconstructed nuclear families were critical for identifying 3,415 de novo mutations in ~1,783 genes. Finally, we demonstrate the segregation of known and suspected disease-causing mutations through reconstructed pedigrees, including a tandem duplication in LDLR causing familial hypercholesterolemia. In summary, this work highlights the prevalence of cryptic relatedness expected among large healthcare population genomic studies and demonstrates several analyses that are uniquely enabled by large amounts of cryptic relatedness.
- Downloaded 991 times
- Download rankings, all-time:
- Site-wide: 37,258
- In genomics: 2,918
- Year to date:
- Site-wide: 166,964
- Since beginning of last month:
- Site-wide: 100,210
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!