Annotation of phenotypes using ontologies: a Gold Standard for the training and evaluation of natural language processing systems
James P Balhoff,
T. Alexander Dececchi,
Posted 15 May 2018
bioRxiv DOI: 10.1101/322156 (published DOI: 10.1093/database/bay110)
Posted 15 May 2018
Natural language descriptions of organismal phenotypes, a principal object of study in biology, are abundant in biological literature. Expressing these phenotypes as logical statements using formal ontologies would enable large-scale analysis on phenotypic information from diverse systems. However, considerable human effort is required to make the semantics of phenotype descriptions amenable to machine reasoning by (a) recognizing appropriate ontological terms for entities in text and (b) stringing these terms into logical statements. Most existing Natural Language Processing tools stop at entity recognition, leaving a need for tools that can assist with both aspects of the task. The recently described Semantic CharaParser aims to meet this need. We describe the first expert-curated Gold Standard corpus for ontology-based annotation of phenotypes from the systematics literature. We use it to evaluate Semantic CharaParser's annotations and explore differences in performance between humans and machine. We use four annotation accuracy metrics that can account for both semantically identical and similar matches. We found that machine human consistency was significantly lower than intercurator (human human) consistency. Surprisingly, allowing curators access to external information that was not available to Semantic CharaParser did not significantly increase the similarity of their annotations to the Gold Standard nor have a significant effect on inter-curator consistency. We found that the similarity of machine annotations to the Gold Standard increased after new ontology terms relevant to the input text had been added. Evaluation by the original authors of the character descriptions indicated that the Gold Standard annotations came closer to representing their intended meaning than did either the curator or machine annotations. These findings point toward ways to better design of software to augment human curators, and the Gold Standard corpus will allow training and assessment of new tools to improve phenotype annotation accuracy at scale.
- Downloaded 512 times
- Download rankings, all-time:
- Site-wide: 65,049
- In evolutionary biology: 3,294
- Year to date:
- Site-wide: 156,080
- Since beginning of last month:
- Site-wide: 152,595
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!