Rxivist logo

Epigenetic DNA modification is partly under genetic control, and occurs in response to a wide range of environmental exposures. Linking epigenetic marks to clinical outcomes may provide greater insight into underlying molecular processes of disease, assist in the identification of therapeutic targets, and improve risk prediction. Here, we present a statistical approach, based on Bayesian inference, that estimates associations between disease risk and all measured epigenetic probes jointly, automatically controlling for both data structure (including cell-count effects, relatedness, and experimental batch effects) and correlations among probes. We benchmark our approach in simulation study, finding improved estimation of probe associations across a wide range of scenarios over existing approaches. Our method estimates the total proportion of disease risk captured by epigenetic probe variation, and when we applied it to measures of body mass index (BMI) and cigarette consumption behaviour in 5,101 individuals, we find that 66.7% (95% CI 60.0-72.8) of the variation in BMI and 67.7% (95% CI 58.4-76.9) of the variation in cigarette consumption can be captured by methylation array data from whole blood, independent of the variation explained by single nucleotide polymorphism markers. We find novel associations, with smoking behaviour associated with a methylation probe at the MNDA gene with >95% posterior inclusion probability, which is a myeloid cell nuclear differentiation antigen gene previously implicated as a biomarker for inflammation and non-Hodgkin lymphoma risk. We conduct unique genome-wide enrichment analyses, identifying blood cholesterol, lipid transport and sterol metabolism pathways for BMI, and response to xenobiotic stimulus and negative regulation of RNA polymerase II promoter transcription for smoking, all with >95% posterior inclusion probability of having methylation probes with associations >1.5 times larger than the average. Finally, we improve phenotypic prediction in two independent cohorts by 28.7% and 10.2% for BMI and smoking respectively over a LASSO model. These results imply that probe measures may capture large amounts of variance because they are likely a consequence of the phenotype rather than a cause. As a result, omics data may enable accurate characterization of disease progression and identification of individuals who are on a path to disease. Our approach facilitates better understanding of the underlying epigenetic architecture of complex common disease and is applicable to any kind of genomics data.

Download data

  • Downloaded 1,427 times
  • Download rankings, all-time:
    • Site-wide: 22,883
    • In genomics: 2,001
  • Year to date:
    • Site-wide: 139,663
  • Since beginning of last month:
    • Site-wide: 60,459

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide