Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 60,222 bioRxiv papers from 267,718 authors.

Real-time search of all bacterial and viral genomic data

By Phelim Bradley, Henk C Den Bakker, Eduardo P. C. Rocha, Gil McVean, Zamin Iqbal

Posted 15 Dec 2017
bioRxiv DOI: 10.1101/234955 (published DOI: 10.1038/s41587-018-0010-1)

Genome sequencing of pathogens is now ubiquitous in microbiology, and the sequence archives are effectively no longer searchable for arbitrary sequences. Furthermore, the exponential increase of these archives is likely to be further spurred by automated diagnostics. To unlock their use for scientific research and real-time surveillance we have combined knowledge about bacterial genetic variation with ideas used in web-search, to build a DNA search engine for microbial data that can grow incrementally. We indexed the complete global corpus of bacterial and viral whole genome sequence data (447,833 genomes), using four orders of magnitude less storage than previous methods. The method allows future scaling to millions of genomes. This renders the global archive accessible to sequence search, which we demonstrate with three applications: ultra-fast search for resistance genes MCR1-3, analysis of host-range for 2827 plasmids, and quantification of the rise of antibiotic resistance prevalence in the sequence archives.

Download data

  • Downloaded 5,434 times
  • Download rankings, all-time:
    • Site-wide: 337 out of 60,222
    • In bioinformatics: 64 out of 6,078
  • Year to date:
    • Site-wide: 1,487 out of 60,222
  • Since beginning of last month:
    • Site-wide: 8,462 out of 60,222

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News