Rxivist logo

SAM/BAM format v1.5 extensions for de novo assemblies

By Peter J. A. Cock, James Bonfield, Bastien Chevreux, Heng Li

Posted 29 May 2015
bioRxiv DOI: 10.1101/020024

Summary: The plain text Sequence Alignment/Map (SAM) file format and its companion binary form (BAM) are a generic alignment format for storing read alignments against reference sequences (and unmapped reads) together with structured meta-data (Li et al., 2009). Driven by the needs of the 1000 Genomes Project which sequenced many individual human genomes, early SAM/BAM usage focused on pairwise alignments of reads to a reference. However, through the CIGAR P operator multiple sequence alignments can also be preserved. Herein we describe clarifications and additions in version 1.5 of the specification to facilitate storing de novo sequence alignments: Padded reference sequences (with gap characters), annotation of reads or regions of the reference, and the option of embedding the reference sequence within the file. Availability: The latest public release of the specification is at http://samtools.sourceforge.net/SAM1.pdf, with in development drafts at https://github.com/samtools/hts-specs/ under version control.

Download data

  • Downloaded 3,795 times
  • Download rankings, all-time:
    • Site-wide: 1,291 out of 92,758
    • In bioinformatics: 215 out of 8,685
  • Year to date:
    • Site-wide: 19,210 out of 92,758
  • Since beginning of last month:
    • Site-wide: 14,841 out of 92,758

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)