Rxivist logo

Machine Learning in Multi-Omics Data to Assess Longitudinal Predictors of Glycaemic Health

By Laurie Prélot, Harmen Draisma, Mila D. Anasanti, Zhanna Balkhiyarova, Matthias Wielscher, Loic Yengo, Beverley Balkau, Ronan Roussel, Sylvain Sebert, Mika Ala-Korpela, Philippe Froguel, Marjo-Riitta Jarvelin, Marika Kaakinen, Inga Prokopenko

Posted 29 Jun 2018
bioRxiv DOI: 10.1101/358390

Type 2 diabetes (T2D) is a global health burden that will benefit from personalised risk prediction and targeted prevention programmes. Omics data have enabled more detailed risk prediction; however, most studies have focussed on directly on the ability of DNA variants predicting T2D onset with less attention given to epigenetic regulation and glycaemic trait variability. By applying machine learning to the longitudinal Northern Finland Birth Cohort 1966 (NFBC 1966) at 31 (T1) and 46 (T2) years old, we predicted fasting glucose (FG) and insulin (FI), glycated haemoglobin (HbA1c) and 2-hour glucose and insulin from oral glucose tolerance test (2hGlu, 2hIns) at T2 in 513 individuals from 1,001 variables at T1 and T2, including anthropometric, metabolic, metabolomic and epigenetic variables. We further tested whether the information obtained by the machine learning models in NFBC could be used to predict glycaemic traits in the independent French study with 48 matching predictors (DESIR, N=769, age range 30-65 years at recruitment, interval between data collections: 9 years). In this study, FG and FI were best predicted, with average R2 values of 0.38 and 0.53. Sex, branched-chain and aromatic amino acids, HDL-cholesterol, glycerol, ketone bodies, blood pressure at T2 and measurements of adiposity at T1, as well as multiple methylation marks at both time points were amongst the top predictors. In the validation analysis, we reached R2 values of 0.41/0.55 for FG/FI when trained and tested in NFBC1966 and 0.17/0.30 when trained in NFBC1966 and tested in DESIR. We identified clinically relevant sets of predictors from a large multi-omics dataset and highlighted the potential of methylation markers and longitudinal changes in prediction.

Download data

  • Downloaded 1,343 times
  • Download rankings, all-time:
    • Site-wide: 25,162
    • In genomics: 2,156
  • Year to date:
    • Site-wide: 114,892
  • Since beginning of last month:
    • Site-wide: 90,340

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide