Rxivist logo

Low-frequency variant functional architectures reveal strength of negative selection across coding and non-coding annotations

By Steven Gazal, Po-Ru Loh, Hilary K Finucane, Andrea Ganna, Armin P. Schoech, Shamil Sunyaev, Alkes Price

Posted 09 Apr 2018
bioRxiv DOI: 10.1101/297572 (published DOI: 10.1038/s41588-018-0231-8)

Common variant heritability is known to be concentrated in variants within cell-type-specific non-coding functional annotations, with a limited role for common coding variants. However, little is known about the functional distribution of low-frequency variant heritability. Here, we partitioned the heritability of both low-frequency (0.5% ≤ MAF < 5%) and common (MAF ≥ 5%) variants in 40 UK Biobank traits (average N = 363K) across a broad set of coding and non-coding functional annotations, employing an extension of stratified LD score regression to low-frequency variants that produces robust results in simulations. We determined that non-synonymous coding variants explain 17±1% of low-frequency variant heritability (h2lf) versus only 2.1±0.2% of common variant heritability (h2c), and that regions conserved in primates explain nearly half of h2lf (43±2%). Other annotations previously linked to negative selection, including non-synonymous variants with high PolyPhen-2 scores, non-synonymous variants in genes under strong selection, and low-LD variants, were also significantly more enriched for h2lf as compared to h2c. Cell-type-specific non-coding annotations that were significantly enriched for h2c of corresponding traits tended to be similarly enriched for h2lf for most traits, but more enriched for brain-related annotations and traits. For example, H3K4me3 marks in brain DPFC explain 57±12% of h2lf vs. 12±2% of h2c for neuroticism, implicating the action of negative selection on low-frequency variants affecting gene regulation in the brain. Forward simulations confirmed that the ratio of low-frequency variant enrichment vs. common variant enrichment primarily depends on the mean selection coefficient of causal variants in the annotation, and can be used to predict the effect size variance of causal rare variants (MAF < 0.5%) in the annotation, informing their prioritization in whole-genome sequencing studies. Our results provide a deeper understanding of low-frequency variant functional architectures and guidelines for the design of association studies targeting functional classes of low-frequency and rare variants.

Download data

  • Downloaded 1,102 times
  • Download rankings, all-time:
    • Site-wide: 24,702
    • In genetics: 1,080
  • Year to date:
    • Site-wide: 105,237
  • Since beginning of last month:
    • Site-wide: 126,030

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide