Mitochondrial DNA variation across 56,434 individuals in gnomAD
Kristen M Laricchia,
Nicole J Lake,
Nicholas A Watts,
Kiran V. Garimella,
Genome Aggregation Database Consortium,
Heidi L Rehm,
Daniel G. MacArthur,
Monkol V Lek,
Vamsi K Mootha,
Sarah E Calvo
Posted 23 Jul 2021
bioRxiv DOI: 10.1101/2021.07.23.453510
Posted 23 Jul 2021
Databases of allele frequency are extremely helpful for evaluating clinical variants of unknown significance; however, until now, genetic databases such as the Genome Aggregation Database (gnomAD) have ignored the mitochondrial genome (mtDNA). Here we present a pipeline to call mtDNA variants that addresses three technical challenges: (i) detecting homoplasmic and heteroplasmic variants, present respectively in all or a fraction of mtDNA molecules, (ii) circular mtDNA genome, and (iii) misalignment of nuclear sequences of mitochondrial origin (NUMTs). We observed that mtDNA copy number per cell varied across gnomAD cohorts and influenced the fraction of NUMT-derived false-positive variant calls, which can account for the majority of putative heteroplasmies. To avoid false positives, we excluded samples prone to NUMT misalignment (few mtDNA copies per cell), cell line artifacts (many mtDNA copies per cell), or with contamination and we reported variants with heteroplasmy greater than 10%. We applied this pipeline to 56,434 whole genome sequences in the gnomAD v3.1 database that includes individuals of European (58%), African (25%), Latino (10%), and Asian (5%) ancestry. Our gnomAD v3.1 release contains population frequencies for 10,850 unique mtDNA variants at more than half of all mtDNA bases. Importantly, we report frequencies within each nuclear ancestral population and mitochondrial haplogroup. Homoplasmic variants account for most variant calls (98%) and unique variants (85%). We observed that 1/250 individuals carry a pathogenic mtDNA variant with heteroplasmy above 10%. These mitochondrial population allele frequencies are publicly available at gnomad.broadinstitute.org and will aid in diagnostic interpretation and research studies.
- Downloaded 1,496 times
- Download rankings, all-time:
- Site-wide: 21,152
- In genomics: 1,863
- Year to date:
- Site-wide: 33,625
- Since beginning of last month:
- Site-wide: 74,326
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!