Rxivist logo

An integrated metagenomics pipeline for strain profiling reveals novel patterns of transmission and global biogeography of bacteria

By Stephen Nayfach, Beltran Rodriguez-Mueller, Nandita Garud, Katherine Pollard

Posted 14 Nov 2015
bioRxiv DOI: 10.1101/031757 (published DOI: 10.1101/gr.201863.115)

We present the Metagenomic Intra-species Diversity Analysis System (MIDAS), which is an integrated computational pipeline for quantifying bacterial species abundance and strain-level genomic variation, including gene content and single nucleotide polymorphisms, from shotgun metagenomes. Our method leverages a database of >30,000 bacterial reference genomes which we clustered into species groups. These cover the majority of abundant species in the human microbiome but only a small proportion of microbes in other environments, including soil and seawater. We applied MIDAS to stool metagenomes from 98 Swedish mothers and their infants over one year and used rare single nucleotide variants to reveal extensive vertical transmission of strains at birth but colonization with strains unlikely to derive from the mother at later time points. This pattern was missed with species-level analysis, because the infant gut microbiome composition converges towards that of an adult over time. We also applied MIDAS to 198 globally distributed marine metagenomes and used gene content to show that many prevalent bacterial species have population structure that correlates with geographic location. Strain-level genetic variants present in metagenomes clearly reveal extensive structure and dynamics that are obscured when data is analyzed at a higher taxonomic resolution.

Download data

  • Downloaded 3,320 times
  • Download rankings, all-time:
    • Site-wide: 5,618
    • In genomics: 507
  • Year to date:
    • Site-wide: 45,212
  • Since beginning of last month:
    • Site-wide: 49,982

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide