Rxivist logo

Dashing: Fast and Accurate Genomic Distances with HyperLogLog

By Daniel N Baker, Ben Langmead

Posted 20 Dec 2018
bioRxiv DOI: 10.1101/501726 (published DOI: 10.1186/s13059-019-1875-0)

Dashing is a fast and accurate software tool for estimating similarities of genomes or sequencing datasets. It uses the HyperLogLog sketch together with cardinality estimation methods that are specialized for set unions and intersections. Dashing summarizes genomes more rapidly than previous MinHash-based methods while providing greater accuracy across a wide range of input sizes and sketch sizes. It can sketch and calculate pairwise distances for over 87K genomes in 6 minutes. Dashing is open source and available at https://github.com/dnbaker/dashing.

Download data

  • Downloaded 2,517 times
  • Download rankings, all-time:
    • Site-wide: 2,478 out of 88,857
    • In bioinformatics: 457 out of 8,400
  • Year to date:
    • Site-wide: 11,929 out of 88,857
  • Since beginning of last month:
    • Site-wide: 17,195 out of 88,857

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)