Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 67,594 bioRxiv papers from 298,144 authors.

Uncovering Medical Insights from Vast Amounts of Biomedical Data in Clinical Case Reports

By Yijiang Zhou, David A. Liem, Jessica M. Lee, Quan Cao, Brian Bleakley, J. Harry Caufield, Sanjana Murali, Wei Wang, Li Zhang, Alex Bui, Yizhou Sun, Karol E. Watson, Jiawei Han, Peipei Ping

Posted 04 Aug 2017
bioRxiv DOI: 10.1101/172460

Clinical case reports (CCRs) have a time-honored tradition in serving as an important means of sharing clinical experiences on patients presenting with atypical disease phenotypes or receiving new therapies. However, the huge amount of accumulated case reports are isolated, unstructured, and heterogeneous clinical data, posing a great challenge to clinicians and researchers in mining relevant information through existing indexing tools. In this investigation, in order to render CCRs more findable, accessible, interoperable, and reusable (FAIR) by the biomedical community, we created a resource platform, including the construction of a test dataset consisting of 1000 CCRs spanning 14 disease phenotypes, a standardized metadata template and metrics, and a set of computational tools to automatically retrieve relevant medical information and to analyze all published PubMed clinical case reports with respect to trends in publication journals, citations impact, MeSH Terms, drug use, distributions of patient demographics, and relationships with other case reports and databases. Our standardized metadata template and CCR test dataset may be valuable resources to advance medical science and improve patient care for researchers who are using machine learning approaches with a high-quality dataset to train and validate their algorithms. In the future, our analytical tools may be applied towards other large clinical data sources as well.

Download data

  • Downloaded 268 times
  • Download rankings, all-time:
    • Site-wide: 40,143 out of 67,591
    • In bioinformatics: 4,774 out of 6,655
  • Year to date:
    • Site-wide: 60,346 out of 67,591
  • Since beginning of last month:
    • Site-wide: 58,993 out of 67,591

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News