Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 59,758 bioRxiv papers from 265,632 authors.

A natural encoding of genetic variation in a Burrows-Wheeler Transform to enable mapping and genome inference

By Sorina Maciuca, Carlos del Ojo Elias, Gil McVean, Zamin Iqbal

Posted 15 Jun 2016
bioRxiv DOI: 10.1101/059170

We show how positional markers can be used to encode genetic variation within a Burrows-Wheeler Transform (BWT), and use this to construct a generalisation of the traditional 'reference genome', incorporating known variation within a species. Our goal is to support the inference of the closest mosaic of previously known sequences to the genome(s) under analysis. Our scheme results in an increased alphabet size, and by using a wavelet tree encoding of the BWT we reduce the performance impact on rank operations. We give a specialised form of the backward search that allows variation-aware exact matching. We implement this, and demonstrate the cost of constructing an index of the whole human genome with 8 million genetic variants is 25GB of RAM. We also show that inferring a closer reference can close large kilobase-scale coverage gaps in P. falciparum.

Download data

  • Downloaded 2,040 times
  • Download rankings, all-time:
    • Site-wide: 2,045 out of 59,758
    • In bioinformatics: 425 out of 6,035
  • Year to date:
    • Site-wide: 23,642 out of 59,758
  • Since beginning of last month:
    • Site-wide: 26,627 out of 59,758

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News