Rxivist logo

Evaluating the informativeness of deep learning annotations for human complex diseases

By Kushal K Dey, Bryce Van de Geijn, Samuel Sungil Kim, Farhad Hormozdiari, David R. Kelley, Alkes Price

Posted 26 Sep 2019
bioRxiv DOI: 10.1101/784439 (published DOI: 10.1038/s41467-020-18515-4)

Deep learning models have shown great promise in predicting genome-wide regulatory effects from DNA sequence, but their informativeness for human complex diseases and traits is not fully understood. Here, we evaluate the disease informativeness of allelic-effect annotations (absolute value of the predicted difference between reference and variant alleles) constructed using two previously trained deep learning models, DeepSEA and Basenji. We apply stratified LD score regression (S-LDSC) to 41 independent diseases and complex traits (average N=320K) to evaluate each annotation's informativeness for disease heritability conditional on a broad set of coding, conserved, regulatory and LD-related annotations from the baseline-LD model and other sources; as a secondary metric, we also evaluate the accuracy of models that incorporate deep learning annotations in predicting disease-associated or fine-mapped SNPs. We aggregated annotations across all tissues (resp. blood cell types or brain tissues) in meta-analyses across all 41 traits (resp. 11 blood-related traits or 8 brain-related traits). These allelic-effect annotations were highly enriched for disease heritability, but produced only limited conditionally significant results - only Basenji-H3K4me3 in meta-analyses across all 41 traits and brain-specific Basenji-H3K4me3 in meta-analyses across 8 brain-related traits. We conclude that deep learning models are yet to achieve their full potential to provide considerable amount of unique information for complex disease, and that the informativeness of deep learning models for disease beyond established functional annotations cannot be inferred from metrics based on their accuracy in predicting regulatory annotations. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 1,336 times
  • Download rankings, all-time:
    • Site-wide: 19,168
    • In genetics: 835
  • Year to date:
    • Site-wide: 115,340
  • Since beginning of last month:
    • Site-wide: 126,899

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide