Rxivist logo

GATTACA: Lightweight Metagenomic Binning With Compact Indexing Of Kmer Counts And MinHash-based Panel Selection

By Victoria Popic, Volodymyr Kuleshov, Michael Snyder, Serafim Batzoglou

Posted 26 Apr 2017
bioRxiv DOI: 10.1101/130997 (published DOI: 10.1089/cmb.2017.0250)

We introduce GATTACA, a framework for rapid and accurate binning of metagenomic contigs from a single or multiple metagenomic samples into clusters associated with individual species. The clusters are computed using co-abundance profiles within a set of reference metagnomes; unlike previous methods, GATTACA estimates these profiles from k-mer counts stored in a highly compact index. On multiple synthetic and real benchmark datasets, GATTACA produces clusters that correspond to distinct bacterial species with an accuracy that matches earlier methods, while being up to 20x faster when the reference panel index can be computed offline and 6x faster for online co-abundance estimation. Leveraging the MinHash technique to quickly compare metagenomic samples, GATTACA also provides an efficient way to identify publicly-available metagenomic data that can be incorporated into the set of reference metagenomes to further improve binning accuracy. Thus, enabling easy indexing and reuse of publicly-available metagenomic datasets, GATTACA makes accurate metagenomic analyses accessible to a much wider range of researchers.

Download data

  • Downloaded 2,019 times
  • Download rankings, all-time:
    • Site-wide: 10,314
    • In bioinformatics: 1,144
  • Year to date:
    • Site-wide: 114,649
  • Since beginning of last month:
    • Site-wide: 68,050

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide