Learning from Longitudinal Data in Electronic Health Record and Genetic Data to Improve Cardiovascular Event Prediction
Russell A. Wilke,
Quinn S Wells,
Joshua C. Denny,
Posted 11 Jul 2018
bioRxiv DOI: 10.1101/366682 (published DOI: 10.1038/s41598-018-36745-x)
Posted 11 Jul 2018
Background: Current approaches to predicting Cardiovascular disease rely on conventional risk factors and cross-sectional data. In this study, we asked whether: i) machine learning and deep learning models with longitudinal EHR information can improve the prediction of 10-year CVD risk, and ii) incorporating genetic data can add values to predictability. Methods: We conducted two experiments. In the first experiment, we modeled longitudinal EHR data with aggregated features and temporal features. We applied logistic regression (LR), random forests (RF) and gradient boosting trees (GBT) and Convolutional Neural Networks (CNN) and Recurrent Neural Networks, using Long Short-Term Memory (LSTM) units. In the second experiment, we proposed a late-fusion framework to incorporate genetic features. Results: Our study cohort included 109, 490 individuals (9,824 were cases and 99, 666 were controls) from Vanderbilt University Medical Center (VUMC) de-identified EHRs. American College of Cardiology and the American Heart Association (ACC/AHA) Pooled Cohort Risk Equations had areas under receiver operating characteristic curves (AUROC) of 0.732 and areas under receiver under precision and recall curves (AUPRC) of 0.187. LSTM, CNN and GBT with temporal features achieved best results, which had AUROC of 0.789, 0.790, and 0.791, and AUPRC of 0.282, 0.280 and 0.285, respectively. The late fusion approach achieved a significant improvement for the prediction performance. Conclusions: Machine learning and deep learning with longitudinal features improved the 10-year CVD risk prediction. Incorporating genetic features further enhanced 10-year CVD prediction performance, underscoring the importance of integrating relevant genetic data whenever available in the context of routine care.
- Downloaded 1,587 times
- Download rankings, all-time:
- Site-wide: 4,997 out of 84,359
- In epidemiology: 43 out of 1,556
- Year to date:
- Site-wide: 5,305 out of 84,359
- Since beginning of last month:
- Site-wide: 5,577 out of 84,359
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!