Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 64,995 bioRxiv papers from 288,045 authors.

BamHash: a checksum program for verifying the integrity of sequence data

By Arna Óskarsdóttir, Gísli Másson, Páll Melsted

Posted 02 Mar 2015
bioRxiv DOI: 10.1101/015867 (published DOI: 10.1093/bioinformatics/btv539)

Summary: Large resequencing projects require a significant amount of storage for raw sequences, as well as alignment files. Since the raw sequences are redundant once the alignment has been generated, it is possible to keep only the alignment files. We present BamHash, a checksum based method to ensure that the read pairs in FASTQ files match exactly the read pairs stored in BAM files, regardless of the ordering of reads. BamHash can be used to verify the integrity of the files stored and discover any discrepancies. Thus, BamHash can be used to determine if it is safe to delete the FASTQ files storing raw sequencing reads after alignment, without the loss of data. Availability and Implementation: The software is implemented in C++, GPL licensed and available at https://github.com/DecodeGenetics/BamHash Contact pmelsted@hi.is

Download data

  • Downloaded 680 times
  • Download rankings, all-time:
    • Site-wide: 13,355 out of 64,995
    • In bioinformatics: 2,177 out of 6,434
  • Year to date:
    • Site-wide: 56,842 out of 64,995
  • Since beginning of last month:
    • Site-wide: 59,391 out of 64,995

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide

Sign up for the Rxivist weekly newsletter! (Click here for more details.)