Rxivist logo

Scalable microbial strain inference in metagenomic data using StrainFacts

By Byron J. Smith, Xiangpeng Li, Zhou Jason Shi, Adam Abate, Katherine Pollard

Posted 04 Feb 2022
bioRxiv DOI: 10.1101/2022.02.01.478746

While genome databases are nearing a complete catalog of species commonly inhabiting the human gut, their representation of intraspecific diversity is lacking for all but the most abundant and frequently studied taxa. Statistical deconvolution of allele frequencies from shotgun metagenomic data into strain genotypes and relative abundances is a promising approach, but existing methods are limited by computational scalability. Here we introduce StrainFacts, a method for strain deconvolution that enables inference across tens of thousands of metagenomes. We harness a "fuzzy" genotype approximation that makes the underlying graphical model fully differentiable, unlike existing methods. This allows parameter estimates to be optimized with gradient-based methods, speeding up model fitting by two orders of magnitude. A GPU implementation provides additional scalability. Extensive simulations show that StrainFacts can perform strain inference on thousands of metagenomes and has comparable accuracy to more computationally intensive tools. We further validate our strain inferences using single-cell genomic sequencing from a human stool sample. Applying StrainFacts to a collection of more than 10,000 publicly available human stool metagenomes, we quantify patterns of strain diversity, biogeography, and linkage-disequilibrium that agree with and expand on what is known based on existing reference genomes. StrainFacts paves the way for large-scale biogeography and population genetic studies of microbiomes using metagenomic data.

Download data

  • Downloaded 456 times
  • Download rankings, all-time:
    • Site-wide: 96,891
    • In bioinformatics: 8,249
  • Year to date:
    • Site-wide: 5,664
  • Since beginning of last month:
    • Site-wide: 24,491

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide