Ankle and Toe Brachial Index Extraction from Clinical Reports For Peripheral Artery Disease Identification: Unlocking Clinical Data through Novel Methods
Julia E Friberg,
Abdul H Qazi,
Dax M Westerman,
Olga V Patterson,
Sharidan K Parr,
Michael E Matheny,
Kim G Smolderen,
Brian C Lund,
Glenn T Gobbel,
Posted 10 May 2021
medRxiv DOI: 10.1101/2021.05.08.21256421
Posted 10 May 2021
ABSTRACT Importance: Despite its high prevalence and poor outcomes, research on peripheral artery disease (PAD) remains limited due to the poor accuracy of billing codes for identifying PAD in health systems. Objective: Design a natural language processing (NLP) system that can extract ankle brachial index (ABI) and toe brachial index (TBI) values and evaluate the performance of extracted ABI/TBI values to identify patients with PAD in the Veterans Health Administration (VHA). Design, Setting, Participants: From a corpus of 392,244 ABI test reports at 94 VHA facilities during 2015-2017, we selected a random sample of 800 documents for NLP development. Using machine learning, we designed the NLP system to extract ABI and TBI values and laterality (right or left). Performance was optimized through sequential iterations of 10-fold cross validation and error analysis on 3 sets of 200 documents each, and tested on a final, independent set of 200 documents. Performance of NLP-extracted ABI and TBI values to identify PAD in a random sample of Veterans undergoing ABI testing was compared to structured chart review. Exposure: ABI <0.9, or TBI <0.7 in either right or left limb used to define PAD at the patient-level Main Outcome: Precision (or positive predictive value), recall (or sensitivity), F-1 measure (overall measure of accuracy, defined as harmonic mean of precision and recall) Results: The NLP system had an overall precision of 0.85, recall of 0.93 and F1-measure of 0.89. The F-1 measure was similar for both ABI and TBI (0.88 to 0.91). Recall was higher for ABI (0.95 to 0.97) while precision was higher for TBI (0.94 to 0.95). Among 261 patients with ABI testing (49% with PAD), the NLP system was able to extract ABI and TBI values in 238 (91.2%) patients. The NLP system had a positive predictive value of 92.3%, sensitivity of 89.3% and specificity of 92.3% to identify PAD. Conclusion: We have successfully developed and validated an NLP system to extract ABI and TBI values which can be used to accurately identify PAD within the VHA. Our findings have broad implications for PAD research and quality improvement efforts in large health systems.
- Downloaded 150 times
- Download rankings, all-time:
- Site-wide: 145,392
- In cardiovascular medicine: 459
- Year to date:
- Site-wide: 61,944
- Since beginning of last month:
- Site-wide: 53,993
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!