Rapid Identification and Phenotyping of Nonalcoholic Fatty Liver Disease Patients Using an Automated Algorithmic Approach in Diverse, Urban Healthcare Systems
Anna Okula Basile,
Leigh Ann Tang,
Brittney M Destin,
Rotonya M Carr,
Muredach P. Reilly,
Marylyn D. Ritchie,
Nicholas P Tatonetti,
Posted 30 Apr 2021
medRxiv DOI: 10.1101/2021.04.27.21256139
Posted 30 Apr 2021
Background and AimsNonalcoholic Fatty Liver Disease (NAFLD) is the most common global cause of chronic liver disease. Therapeutic interventions are rapidly advancing for its inflammatory phenotype, nonalcoholic steatohepatitis (NASH). Diagnosis codes alone fail to accurately recognize at-risk patients. The objective of the present work is to identify NAFLD patients within large electronic health record (EHR) databases for targeted intervention based on clinically relevant phenotypes. MethodsWe present a rule-based phenotype algorithm for the rapid identification of NAFLD patients developed using EHRs from 5.8 million adult patients at Columbia University Irving Medical Center (CUIMC). The algorithm was developed using the Observational Medical Outcomes Partnership (OMOP) Common Data Model, and queries multiple structured and unstructured data elements, including diagnosis codes, laboratory measurements, radiology and pathology modalities. ResultsOur approach identified 16,060 CUIMC NAFLD patients with 170 having a biopsy-proven NASH diagnosis. Fibrosis scoring on patients without histology identified 943 with scores indicative of advanced fibrosis (FIB-4, APRI, NAFLD) in 2 of the scoring metrics. The algorithm was validated at two independent healthcare systems, University of Pennsylvania Healthcare System (UPHS) and Vanderbilt Medical Center (VUMC), where 20,779 and 19,575 NAFLD patients were identified, respectively. Clinical chart review identified a high positive predictive value (PPV) for the algorithm across all healthcare systems: 91% at CUIMC, 75% at UPHS, and 85% at VUMC. ConclusionsOur rule-based algorithm provides an accurate, automated approach for rapidly identifying and sub-phenotyping NAFLD patients within a large EHR system. This highlights the clinical potential algorithms have in discovering NAFLD patients at highest risk for disease progression for diagnostic and therapeutic intervention. Data Transparency StatementAlgorithmic code is available for academic, non-commercial collaborations by request to the corresponding authors. What You Need to KnowO_ST_ABSBackground and ContextC_ST_ABSNAFLD is the leading form of liver disease worldwide with a rising prevalence in the population. Current means of identification are complex and dependent on provider recognition of clinical risk factors. New FindingsWe present an accurate (mean PPV=84%) and cross-institution validated, rule-based algorithm for the high-throughput, rapid identification of NAFLD patients across diverse EHR systems comprising approximately 12.1 million patients. The majority of patients were previously unidentified. LimitationsInaccessible imaging and histologic data (performed outside the healthcare system) limited our ability to verify hepatic steatosis and resulted in low sensitivity for the final step of the algorithm. ImpactOur NAFLD algorithm provides an accurate means of rapidly identifying NAFLD in large EHR systems to target patients at greatest risk for disease progression and clinical outcomes towards diagnostic and therapeutic interventions. Short SummaryNAFLD, the leading cause of liver disease globally, is often under-recognized in at-risk individuals. Here we present a rapid, non-invasive algorithm for identifying patients within large health systems who are at greatest risk for disease progression and clinical decompensation for diagnostic and therapeutic intervention.
- Downloaded 200 times
- Download rankings, all-time:
- Site-wide: 119,231
- In gastroenterology: 155
- Year to date:
- Site-wide: 26,762
- Since beginning of last month:
- Site-wide: 12,409
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!