Low-frequency variant functional architectures reveal strength of negative selection across coding and non-coding annotations
Common variant heritability is known to be concentrated in variants within cell-type-specific non-coding functional annotations, with a limited role for common coding variants. However, little is known about the functional distribution of low-frequency variant heritability. Here, we partitioned the heritability of both low-frequency (0.5% ≤ MAF < 5%) and common (MAF ≥ 5%) variants in 40 UK Biobank traits (average N = 363K) across a broad set of coding and non-coding functional annotations, employing an extension of stratified LD score regression to low-frequency variants that produces robust results in simulations. We determined that non-synonymous coding variants explain 17±1% of low-frequency variant heritability (h2lf) versus only 2.1±0.2% of common variant heritability (h2c), and that regions conserved in primates explain nearly half of h2lf (43±2%). Other annotations previously linked to negative selection, including non-synonymous variants with high PolyPhen-2 scores, non-synonymous variants in genes under strong selection, and low-LD variants, were also significantly more enriched for h2lf as compared to h2c. Cell-type-specific non-coding annotations that were significantly enriched for h2c of corresponding traits tended to be similarly enriched for h2lf for most traits, but more enriched for brain-related annotations and traits. For example, H3K4me3 marks in brain DPFC explain 57±12% of h2lf vs. 12±2% of h2c for neuroticism, implicating the action of negative selection on low-frequency variants affecting gene regulation in the brain. Forward simulations confirmed that the ratio of low-frequency variant enrichment vs. common variant enrichment primarily depends on the mean selection coefficient of causal variants in the annotation, and can be used to predict the effect size variance of causal rare variants (MAF < 0.5%) in the annotation, informing their prioritization in whole-genome sequencing studies. Our results provide a deeper understanding of low-frequency variant functional architectures and guidelines for the design of association studies targeting functional classes of low-frequency and rare variants.
- Downloaded 1,102 times
- Download rankings, all-time:
- Site-wide: 24,702
- In genetics: 1,080
- Year to date:
- Site-wide: 105,237
- Since beginning of last month:
- Site-wide: 126,030
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!