Purpose The explosion of molecular biomarker and treatment information in the precision medicine era drastically exacerbated difficulty in identifying patient-relevant knowledge for clinical researchers and practitioners. Curated knowledgebases, such as the JAX Clinical Knowledgebase (CKB) are tools to organize and display knowledge in a readily accessible format; however, curators face the same challenges in comprehensively identifying clinically relevant information for curation. Natural language processing (NLP) has emerged as a promising direction for accelerating manual curation, but prior applications were often conceived as stand-alone efforts to automate curation, and the scope is often limited to simple entity and relation extraction. In this paper, we study the alternative paradigm of assisted curation and identify key desiderata to scale up knowledge curation with human-computer symbiosis. Methods We chose precision oncology for a case study and introduced self-supervised machine reading, which can automatically generate noisy training examples from unlabeled text. We developed a curation user interface (UI) for precision oncology and through iterative curathons (curation hackathons), conducted retrospective and prospective user studies for head-to-head comparison between manual and machine-assisted curation. Results Contrary to the prevailing assumption, we showed that high recall is more important for end-to-end assisted curation. In extensive user studies, we showed that assisted curation can double the curation speed and increase the number of findings by an order of magnitude for previously scarcely curated drugs. Conclusion We demonstrated that an iterative and thoughtful collaboration between professional curators and NLP researchers can facilitate rapid advances in assisted curation for precision medicine. Human-machine reading symbiosis can potentially be applicable to clinical care and research scenarios where curation is a major bottleneck.
- Downloaded 166 times
- Download rankings, all-time:
- Site-wide: 155,127
- In health informatics: 703
- Year to date:
- Site-wide: 82,900
- Since beginning of last month:
- Site-wide: 76,270
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!