Rxivist logo

Producing Polished Prokaryotic Pangenomes with the Panaroo Pipeline

By Gerry Tonkin-Hill, Neil MacAlasdair, Christopher Ruis, Aaron Weimann, Gal Horesh, John Lees, Rebecca A. Gladstone, Stephanie Lo, Christopher Beaudoin, R Andrés Floto, Simon D.W. Frost, Jukka Corander, Stephen D. Bentley, Julian Parkhill

Posted 28 Jan 2020
bioRxiv DOI: 10.1101/2020.01.28.922989 (published DOI: 10.1186/s13059-020-02090-4)

Population-level comparisons of prokaryotic genomes must take into account the substantial differences in gene content, resulting from frequent horizontal gene transfer, gene duplication and gene loss. However, the automated annotation of prokaryotic genomes is imperfect, and errors due to fragmented assemblies, contamination, diverse gene families and mis-assemblies accumulate over the population, leading to profound consequences when analysing the set of all genes found in a species. Here we introduce Panaroo, a graph based pangenome clustering tool that is able to account for many of the sources of error introduced during the annotation of prokaryotic genome assemblies. We verified our approach through extensive simulations of de novo assemblies using the infinitely many genes model and by analysing a number of publicly available large bacterial genome datasets. Using a highly clonal Mycobacterium tuberculosis dataset as a negative control case, we show that failing to account for annotation errors can lead to pangenome estimates that are dominated by error. We additionally demonstrate the utility of the improved graphical output provided by Panaroo by performing a pan-genome wide association study in Neisseria gonorrhoeae and by analysing gene gain and loss rates across 51 of the major global pneumococcal sequence clusters. Panaroo is freely available under an open source MIT licence at https://github.com/gtonkinhill/panaroo.

Download data

  • Downloaded 1,494 times
  • Download rankings, all-time:
    • Site-wide: 16,192
    • In genomics: 1,549
  • Year to date:
    • Site-wide: 113,883
  • Since beginning of last month:
    • Site-wide: 50,438

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide