Rxivist logo

Mash Screen: High-throughput sequence containment estimation for genome discovery

By Brian D Ondov, Gabriel J Starrett, Anna Sappington, Aleksandra Kostic, Sergey Koren, Christopher B. Buck, Adam M. Phillippy

Posted 01 Mar 2019
bioRxiv DOI: 10.1101/557314 (published DOI: 10.1186/s13059-019-1841-x)

The MinHash algorithm has proven effective for rapidly estimating the resemblance of two genomes or metagenomes. However, this method cannot reliably estimate the containment of a genome within a metagenome. Here we describe an online algorithm capable of measuring the containment of genomes and proteomes within either assembled or unassembled sequencing read sets. We describe several use cases, including contamination screening and retrospective analysis of metagenomes for novel genome discovery. Using this tool, we provide containment estimates for every NCBI RefSeq genome within every SRA metagenome, and demonstrate the identification of a novel polyomavirus species from a public metagenome.

Download data

  • Downloaded 1,608 times
  • Download rankings, all-time:
    • Site-wide: 5,241 out of 89,715
    • In bioinformatics: 951 out of 8,461
  • Year to date:
    • Site-wide: 35,972 out of 89,715
  • Since beginning of last month:
    • Site-wide: 43,757 out of 89,715

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)