Rxivist logo

Metalign: Efficient alignment-based metagenomic profiling via containment min hash

By Nathan LaPierre, Mohammed Alser, Eleazar Eskin, David J Koslicki, Serghei Mangul

Posted 18 Jan 2020
bioRxiv DOI: 10.1101/2020.01.17.910521 (published DOI: 10.1186/s13059-020-02159-0)

Whole-genome shotgun sequencing enables the analysis of microbial communities in unprecedented detail, with major implications in medicine and ecology. Predicting the presence and relative abundances of microbes in a sample, known as "metagenomic profiling", is a critical first step in microbiome analysis. Existing profiling methods have been shown to suffer from poor false positive or false negative rates, while alignment-based approaches are often considered accurate but computationally infeasible. Here we present a novel method, Metalign, that addresses these concerns by performing efficient alignment-based metagenomic profiling. We use a containment min hash approach to reduce the reference database size dramatically before alignment and a method to estimate organism relative abundances in the sample by resolving reads aligned to multiple genomes. We show that Metalign achieves significantly improved results over existing methods on simulated datasets from a large benchmarking study, CAMI, and performs well on in vitro mock community data and environmental data from the Tara Oceans project. Metalign is freely available at https://github.com/nlapier2/Metalign, along with the results and plots used in this paper, and a docker image is also available at https://hub.docker.com/repository/docker/nlapier2/metalign.

Download data

  • Downloaded 866 times
  • Download rankings, all-time:
    • Site-wide: 40,511
    • In bioinformatics: 4,080
  • Year to date:
    • Site-wide: 133,612
  • Since beginning of last month:
    • Site-wide: 141,232

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide