Developing and Evaluating Mappings of ICD-10 and ICD-10-CM Codes to PheCodes
Joshua C Denny,
Posted 05 Nov 2018
bioRxiv DOI: 10.1101/462077 (published DOI: 10.2196/14325)
Posted 05 Nov 2018
Background The PheCode system was built upon the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) for phenome-wide association studies (PheWAS) in the electronic health record (EHR). Objective Here, we present our work on the development and evaluation of maps from ICD-10 and ICD-10-CM codes to PheCodes. Methods We mapped ICD-10 and ICD-10-CM codes to PheCodes using a number of methods and resources, such as concept relationships and explicit mappings from the Unified Medical Language System (UMLS), Observational Health Data Sciences and Informatics (OHDSI), Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT), and National Library of Medicine (NLM). We assessed the coverage of the maps in two databases: Vanderbilt University Medical Center (VUMC) using ICD-10-CM and the UK Biobank (UKBB) using ICD-10. We assessed the fidelity of the ICD-10-CM map in comparison to the gold-standard ICD-9-CM→PheCode map by investigating phenotype reproducibility and conducting a PheWAS. Results We mapped >75% of ICD-10-CM and ICD-10 codes to PheCodes. Of the unique codes observed in the VUMC (ICD-10-CM) and UKBB (ICD-10) cohorts, >90% were mapped to PheCodes. We observed 70-75% reproducibility for chronic diseases and <10% for an acute disease. A PheWAS with a lipoprotein(a) (LPA) genetic variant, rs10455872, using the ICD-9-CM and ICD-10-CM maps replicated two genotype-phenotype associations with similar effect sizes: coronary atherosclerosis (ICD-9-CM: P < .001, OR = 1.60 vs. ICD-10-CM: P < .001, OR = 1.60) and with chronic ischemic heart disease (ICD-9-CM: P < .001, OR = 1.5 vs. ICD-10-CM: P < .001, OR = 1.47). Conclusions This study introduces the initial “beta” versions of ICD-10 and ICD-10-CM to PheCode maps that will enable researchers to leverage accumulated ICD-10 and ICD-10-CM data for high-throughput PheWAS in the EHR. * EHR : electronic health record ICD : International Classification of Diseases AHRQ : Agency for Healthcare Research and Quality CCS : Clinical Classification Software PheWAS : phenome-wide association studies CM : Clinical Modification WHO : World Health Organization NCHS : National Center for Health Statistics UMLS : Unified Medical Language System GEM : General Equivalence Mapping SNOMED CT : Systematized Nomenclature of Medicine Clinical Terms CUI : Concept Unique Identifier OHDSI : Observational Health Data Sciences and Informatics CDM : Common Data Model NLM : National Library of Medicine VUMC : Vanderbilt University Medical Center UKBB : UK Biobank OR : odds ratio LPA : lipoprotein(a) SNP : single nucleotide polymorphism M:1 : many to one SD : standard deviation
- Downloaded 3,289 times
- Download rankings, all-time:
- Site-wide: 3,396
- In bioinformatics: 354
- Year to date:
- Site-wide: 5,509
- Since beginning of last month:
- Site-wide: 5,800
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!