Rxivist logo

Species abundance information improves sequence taxonomy classification accuracy

By Benjamin D. Kaehler, Nicholas A. Bokulich, Daniel McDonald, Rob Knight, J Gregory Caporaso, Gavin Huttley

Posted 03 Sep 2018
bioRxiv DOI: 10.1101/406611 (published DOI: 10.1038/s41467-019-12669-6)

Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate that species-level resolution is attainable.

Download data

  • Downloaded 985 times
  • Download rankings, all-time:
    • Site-wide: 12,573 out of 94,912
    • In bioinformatics: 1,981 out of 8,837
  • Year to date:
    • Site-wide: 20,572 out of 94,912
  • Since beginning of last month:
    • Site-wide: 23,524 out of 94,912

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)