Rxivist logo

Phenotypic signatures in clinical data enable systematic identification of patients for genetic testing

By Theodore J Morley, Lide Han, Jonathan Morra, Nancy J. Cox, Lisa A Bastarache, Douglas Ruderfer

Posted 24 Jul 2020
medRxiv DOI: 10.1101/2020.07.21.20159491

Around five percent of the population is affected by a rare disease, most often due to genetic variation. A genetic test is the quickest path to a diagnosis, yet most suffer through years of diagnostic odyssey before getting a test, if they receive one at all. Identifying patients that are likely to have a genetic disease and therefore need genetic testing is paramount to improving diagnosis and treatment. While there are thousands of previously described genetic diseases with specific phenotypic presentations, a common feature among them is the presence of multiple rare phenotypes which often span organ systems. Here, we hypothesize that these patients can be identified from longitudinal clinical data in the electronic health record (EHR). We used diagnostic information from the EHRs of 2,286 patients that received a chromosomal microarray and 9,144 matched controls to train and test a prediction model. We identified high prediction accuracy (AUROC = 0.97, AUPR = 0.92) in a held-out test sample and in 172,265 hospital patients where cases were defined broadly as interacting with a genetics provider (AUROC = 0.9, AUPR = 0.63). High probabilities (median = 0.97) were associated with 46 patients carrying a known pathogenic copy number variant (CNV) among a subset of 6,445 genotyped patients. Our model identified many more patients needing a genetic test while increasing the proportion having a putative genetic disease compared to the current nonsytematic approach. Taken together, we demonstrate that phenotypic patterns representative of a genetic disease can be captured from EHR data and provide an opportunity to systematize decision making on genetic testing to speed up diagnosis, improve care, and reduce costs.

Download data

  • Downloaded 416 times
  • Download rankings, all-time:
    • Site-wide: 72,271
    • In genetic and genomic medicine: 291
  • Year to date:
    • Site-wide: 33,426
  • Since beginning of last month:
    • Site-wide: 13,316

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide