Rxivist logo

Mandrake: visualising microbial population structure by embedding millions of genomes into a low-dimensional representation

By John A. Lees, Gerry Tonkin-Hill, Zhirong Yang, Jukka Corander

Posted 29 Oct 2021
bioRxiv DOI: 10.1101/2021.10.28.466232

In less than a decade, population genomics of microbes has progressed from the effort of sequencing dozens of strains to thousands, or even tens of thousands of strains in a single study. There are now hundreds of thousands of genomes available even for a single bacterial species and the number of genomes is expected to continue to increase at an accelerated pace given the advances in sequencing technology and widespread genomic surveillance initiatives. This explosion of data calls for innovative methods to enable rapid exploration of the structure of a population based on different data modalities, such as multiple sequence alignments, assemblies and estimates of gene content across different genomes. Here we present Mandrake, an efficient implementation of a dimensional reduction method tailored for the needs of large-scale population genomics. Mandrake is capable of visualising population structure from millions of whole genomes and we illustrate its usefulness with several data sets representing major pathogens. Our method is freely available both as an analysis pipeline (https://github.com/johnlees/mandrake) and as a browser-based interactive application (https://gtonkinhill.github.io/mandrake-web/).

Download data

  • Downloaded 304 times
  • Download rankings, all-time:
    • Site-wide: 119,640
    • In genomics: 6,712
  • Year to date:
    • Site-wide: 17,455
  • Since beginning of last month:
    • Site-wide: 13,492

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide