Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 73,710 bioRxiv papers from 320,843 authors.

Real-time search of all bacterial and viral genomic data

By Phelim Bradley, Henk C Den Bakker, Eduardo PC Rocha, Gilean McVean, Zamin Iqbal

Posted 15 Dec 2017
bioRxiv DOI: 10.1101/234955 (published DOI: 10.1038/s41587-018-0010-1)

Genome sequencing of pathogens is now ubiquitous in microbiology, and the sequence archives are effectively no longer searchable for arbitrary sequences. Furthermore, the exponential increase of these archives is likely to be further spurred by automated diagnostics. To unlock their use for scientific research and real-time surveillance we have combined knowledge about bacterial genetic variation with ideas used in web-search, to build a DNA search engine for microbial data that can grow incrementally. We indexed the complete global corpus of bacterial and viral whole genome sequence data (447,833 genomes), using four orders of magnitude less storage than previous methods. The method allows future scaling to millions of genomes. This renders the global archive accessible to sequence search, which we demonstrate with three applications: ultra-fast search for resistance genes MCR1-3, analysis of host-range for 2827 plasmids, and quantification of the rise of antibiotic resistance prevalence in the sequence archives.

Download data

  • Downloaded 5,687 times
  • Download rankings, all-time:
    • Site-wide: 398 out of 73,741
    • In bioinformatics: 73 out of 7,172
  • Year to date:
    • Site-wide: 16,814 out of 73,741
  • Since beginning of last month:
    • Site-wide: 16,814 out of 73,741

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)