Rxivist logo

Accuracy of Computable Phenotyping Approaches for SARS-CoV-2 Infection and COVID-19 Hospitalizations from the Electronic Health Record

By Rohan Khera, Bobak J Mortazavi, Veer Sangha, Frederick Warner, H. Patrick Young, Joseph S. Ross, Nilay D Shah, Elitza S. Theel, William G Jenkinson, Camille A Knepper, Karen Wang, David R Peaper, Richard A. Martinello, Cynthia A Brandt, Zhenqiu Lin, Albert Ko, Harlan M Krumholz, Benjamin D Pollock, Wade L Schulz

Posted 20 Mar 2021
medRxiv DOI: 10.1101/2021.03.16.21253770

Objective: Real-world data have been critical for rapid-knowledge generation throughout the COVID-19 pandemic. To ensure high-quality results are delivered to guide clinical decision making and the public health response, as well as characterize the response to interventions, it is essential to establish the accuracy of COVID-19 case definitions derived from administrative data to identify infections and hospitalizations. Methods: Electronic Health Record (EHR) data were obtained from the clinical data warehouse of the Yale-New Haven Health System (Yale, primary site) and 3 hospital systems of the Mayo Clinic (validation site). Detailed characteristics on demographics, diagnoses, and laboratory results were obtained for all patients with either a positive SARS-CoV-2 PCR or antigen test or ICD-10 diagnosis of COVID-19 (U07.1) between April 1, 2020 and March 1, 2021. Various computable phenotype definitions were evaluated for their accuracy to identify SARS-CoV-2 infection and COVID-19 hospitalizations. Results: Of the 69,423 individuals with either a diagnosis code or a laboratory diagnosis of a SARS-CoV-2 infection at Yale, 61,023 had a principal or a secondary diagnosis code for COVID-19 and 50,355 had a positive SARS-CoV-2 test. Among those with a positive laboratory test, 38,506 (76.5%) and 3449 (6.8%) had a principal and secondary diagnosis code of COVID-19, respectively, while 8400 (16.7%) had no COVID-19 diagnosis. Moreover, of the 61,023 patients with a COVID-19 diagnosis code, 19,068 (31.2%) did not have a positive laboratory test for SARS-CoV-2 in the EHR. Of the 20 cases randomly sampled from this latter group for manual review, all had a COVID-19 diagnosis code related to asymptomatic testing with negative subsequent test results. The positive predictive value (precision) and sensitivity (recall) of a COVID-19 diagnosis in the medical record for a documented positive SARS-CoV-2 test were 68.8% and 83.3%, respectively. Among 5,109 patients who were hospitalized with a principal diagnosis of COVID-19, 4843 (94.8%) had a positive SARS-CoV-2 test within the 2 weeks preceding hospital admission or during hospitalization. In addition, 789 hospitalizations had a secondary diagnosis of COVID-19, of which 446 (56.5%) had a principal diagnosis consistent with severe clinical manifestation of COVID-19 (e.g., sepsis or respiratory failure). Compared with the cohort that had a principal diagnosis of COVID-19, those with a secondary diagnosis had a more than 2-fold higher in-hospital mortality rate (13.2% vs 28.0%, P<0.001). In the validation sample at Mayo Clinic, diagnosis codes more consistently identified SARS-CoV-2 infection (precision of 95%) but had lower recall (63.5%) with substantial variation across the 3 Mayo Clinic sites. Similar to Yale, diagnosis codes consistently identified COVID-19 hospitalizations at Mayo, with hospitalizations defined by secondary diagnosis code with 2-fold higher in-hospital mortality compared to those with a primary diagnosis of COVID-19. Conclusions: COVID-19 diagnosis codes misclassified the SARS-CoV-2 infection status of many people, with implications for clinical research and epidemiological surveillance. Moreover, the codes had different performance across two academic health systems and identified groups with different risks of mortality. Real-world data from the EHR can be used to in conjunction with diagnosis codes to improve the identification of people infected with SARS-CoV-2.

Download data

  • Downloaded 425 times
  • Download rankings, all-time:
    • Site-wide: 73,280
    • In health informatics: 279
  • Year to date:
    • Site-wide: 11,285
  • Since beginning of last month:
    • Site-wide: 32,703

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

News