Augmented Intelligence with Natural Language Processing Applied to Electronic Health Records is Useful for Identifying Patients with Non-Alcoholic Fatty Liver Disease at Risk for Disease Progression
Tielman T. Van Vleck,
Steven G Coca,
Catherine K Craven,
Stephen B Ellis,
Joseph L Kannry,
Ruth J.F. Loos,
Peter A Bonis,
Girish N Nadkarni
Posted 11 Jan 2019
bioRxiv DOI: 10.1101/518217 (published DOI: 10.1016/j.ijmedinf.2019.06.028)
Posted 11 Jan 2019
Objective: Electronic health record (EHR) systems contain structured data and unstructured documentation. Clinical insights can be derived from analyzing both but optimal methods for this have not been studied extensively. We compared various approaches to analyzing EHR data for non-alcoholic fatty liver disease (NAFLD). Materials and Methods: We compared analysis of structured and unstructured EHR data using natural language processing (NLP), free-text search, and diagnostic codes against expert adjudication as the reference standard. Results: Out of 38,575 patients, we identified 2,281 patients with NAFLD. From the remainder, 10,653 patients with similar data density were selected as a control group. NLP was more sensitive than ICD and text search (NLP 0.93 vs. ICD 0.28 vs. text search 0.81) with higher a F2 score (NLP 0.92 vs. ICD 0.34 vs. text search 0.81). 619 patients had suspected NAFLD documented in radiology notes not acknowledged in other forms of clinical documentation. Of these, 232 (37.5%) were found to have more advanced liver disease after a median of 1,057 days. Discussion: NLP-based approaches have superior accuracy in identifying NAFLD within the EHR compared to ICD/text search-based approaches. Suspected NAFLD on imaging is often not acknowledged in subsequent clinical documentation. Many such patients are later found to have more advanced liver disease. Conclusion: For identification of NAFLD, NLP performed better than alternative selection modalities and facilitated follow-on analysis of information flow. If accuracy can be proven to persist across clinical domains, NLP can identify patient phenotypes for biomedical research in an accurate and high-throughput manner.
- Downloaded 525 times
- Download rankings, all-time:
- Site-wide: 47,029
- In bioinformatics: 4,956
- Year to date:
- Site-wide: 39,471
- Since beginning of last month:
- Site-wide: 61,757
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!