Machine learning enables new insights into clinical significance of and genetic contributions to liver fat accumulation
Mary E. Haas,
James P. Pirruccello,
Samuel N Friedman,
Connor A Emdin,
Veeral H Ajmera,
Tracey G Simon,
Julian R Homburger,
Kathleen E Corey,
Alicia Y. Zhou,
Amit V. Khera
Posted 03 Sep 2020
medRxiv DOI: 10.1101/2020.09.03.20187195
Posted 03 Sep 2020
Excess accumulation of liver fat - termed hepatic steatosis when fat accounts for > 5.5% of liver content - is a leading risk factor for end-stage liver disease and is strongly associated with important cardiometabolic disorders. Using a truth dataset of 4,511 UK Biobank participants with liver fat previously quantified via abdominal MRI imaging, we developed a machine learning algorithm to quantify liver fat with correlation coefficients of 0.97 and 0.99 in hold-out testing datasets and applied this algorithm to raw imaging data from an additional 32,192 participants. Among all 36,703 individuals with abdominal MRI imaging, median liver fat was 2.2%, with 6,250 (17%) meeting criteria for hepatic steatosis. Although individuals afflicted with hepatic steatosis were more likely to have been diagnosed with conditions such as obesity or diabetes, a prediction model based on clinical data alone without imaging could not reliably estimate liver fat content. To identify genetic drivers of variation in liver fat, we first conducted a common variant association study of 9.8 million variants, confirming three known associations for variants in the TM6SF2, APOE, and PNPLA3 genes and identifying five new variants associated with increased hepatic fat in or near the MARC1, ADH1B, TRIB1, GPAM and MAST3 genes. A polygenic score that integrated information from each of these eight variants was strongly associated with future clinical diagnosis of liver diseases. Next, we performed a rare variant association study in a subset of 11,021 participants with gene sequencing data available, identifying an association between inactivating variants in the APOB gene and substantially lower LDL cholesterol, but more than 10-fold increased risk of steatosis. Taken together, these results provide proof of principle for the use of machine learning algorithms on raw imaging data to enable epidemiologic studies and genetic discovery.
- Downloaded 1,058 times
- Download rankings, all-time:
- Site-wide: 21,936
- In genetic and genomic medicine: 85
- Year to date:
- Site-wide: 11,047
- Since beginning of last month:
- Site-wide: 15,128
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!