Rxivist logo

Machine learning enables new insights into clinical significance of and genetic contributions to liver fat accumulation

By Mary E. Haas, James P. Pirruccello, Samuel N Friedman, Connor A Emdin, Veeral H Ajmera, Tracey G Simon, Julian R Homburger, Xiuqing Guo, Matthew Budoff, Kathleen E Corey, Alicia Y. Zhou, Anthony Philippakis, Patrick T. Ellinor, Rohit Loomba, Puneet Batra, Amit V. Khera

Posted 03 Sep 2020
medRxiv DOI: 10.1101/2020.09.03.20187195

Excess accumulation of liver fat - termed hepatic steatosis when fat accounts for > 5.5% of liver content - is a leading risk factor for end-stage liver disease and is strongly associated with important cardiometabolic disorders. Using a truth dataset of 4,511 UK Biobank participants with liver fat previously quantified via abdominal MRI imaging, we developed a machine learning algorithm to quantify liver fat with correlation coefficients of 0.97 and 0.99 in hold-out testing datasets and applied this algorithm to raw imaging data from an additional 32,192 participants. Among all 36,703 individuals with abdominal MRI imaging, median liver fat was 2.2%, with 6,250 (17%) meeting criteria for hepatic steatosis. Although individuals afflicted with hepatic steatosis were more likely to have been diagnosed with conditions such as obesity or diabetes, a prediction model based on clinical data alone without imaging could not reliably estimate liver fat content. To identify genetic drivers of variation in liver fat, we first conducted a common variant association study of 9.8 million variants, confirming three known associations for variants in the TM6SF2, APOE, and PNPLA3 genes and identifying five new variants associated with increased hepatic fat in or near the MARC1, ADH1B, TRIB1, GPAM and MAST3 genes. A polygenic score that integrated information from each of these eight variants was strongly associated with future clinical diagnosis of liver diseases. Next, we performed a rare variant association study in a subset of 11,021 participants with gene sequencing data available, identifying an association between inactivating variants in the APOB gene and substantially lower LDL cholesterol, but more than 10-fold increased risk of steatosis. Taken together, these results provide proof of principle for the use of machine learning algorithms on raw imaging data to enable epidemiologic studies and genetic discovery.

Download data

  • Downloaded 1,214 times
  • Download rankings, all-time:
    • Site-wide: 20,001
    • In genetic and genomic medicine: 87
  • Year to date:
    • Site-wide: 11,405
  • Since beginning of last month:
    • Site-wide: 15,324

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide