Genetic validation of bipolar disorder identified by automated phenotyping using electronic health records
Phil H. Lee,
Victor M Castro,
Alexander W. Charney,
Eli A Stahl,
Douglas M Ruderfer,
Shawn N Murphy,
Roy H. Perlis,
Jordan W. Smoller
Posted 23 Sep 2017
bioRxiv DOI: 10.1101/193011 (published DOI: 10.1038/s41398-018-0133-7)
Posted 23 Sep 2017
Bipolar disorder (BD) is a heritable mood disorder characterized by episodes of mania and depression. Although genomewide association studies (GWAS) have successfully identified genetic loci contributing to BD risk, sample size has become a rate-limiting obstacle to genetic discovery. Electronic health records (EHRs) represent a vast but relatively untapped resource for high-throughput phenotyping. As part of the International Cohort Collection for Bipolar Disorder (ICCBD), we previously validated automated EHR-based phenotyping algorithms for BD against in-person diagnostic interviews (Castro et al. 2015). Here, we establish the genetic validity of these phenotypes by determining their genetic correlation with traditionally-ascertained samples. Case and control algorithms were derived from structured and narrative text in the Partners Healthcare system comprising more than 4.6 million patients over 20 years. Genomewide genotype data for 3,330 BD cases and 3,952 controls of European ancestry were used to estimate SNP-based heritability (h2g) and genetic correlation (rg) between EHR-based phenotype definitions and traditionally-ascertained BD cases in GWAS by the ICCBD and Psychiatric Genomics Consortium (PGC) using LD score regression. We evaluated BD cases identified using 4 EHR-based algorithms: an NLP-based algorithm (95-NLP) and 3 rule-based algorithms using codified EHR with decreasing levels of stringency -"coded-strict", "coded-broad", and "coded-broad based on a single clinical encounter" (coded-broad-SV). The analytic sample comprised 862 95-NLP, 1,968 coded-strict, 2,581 coded-broad, 408 coded-broad-SV BD cases, and 3,952 controls. The estimated h2g were 0.24 (p=0.015), 0.09 (p=0.064), 0.13 (p=0.003), 0.00 (p=0.591) for 95-NLP, coded-strict, coded-broad and coded-broad-SV BD, respectively. The h2g for all EHR-based cases combined except coded-broad-SV (excluded due to 0 h2g) was 0.12 (p=0.004). These h2g were lower or similar to the h2g observed by the ICCBD+PGCBD (0.23, p=3.17E-80, total N=33,181). However, the rg between ICCBD+PGCBD and the EHR-based cases were high for 95-NLP (0.66, p=3.69x10-5), coded-strict (1.00, p=2.40x10-4), and coded-broad (0.74, p=8.11x10-7). The rg between EHR-based BDs ranged from 0.90 to 0.98. These results provide the first genetic validation of automated EHR-based phenotyping for BD and suggest that this approach identifies cases that are highly genetically correlated with those ascertained through conventional methods. High throughput phenotyping using the large data resources available in EHRs represents a viable method for accelerating psychiatric genetic research.
- Downloaded 329 times
- Download rankings, all-time:
- Site-wide: 52,219 out of 94,912
- In genetics: 2,923 out of 4,824
- Year to date:
- Site-wide: 91,981 out of 94,912
- Since beginning of last month:
- Site-wide: 91,395 out of 94,912
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!