Rxivist logo

Reducing reference bias using multiple population reference genomes

By Nae-Chyun Chen, Brad Solomon, Taher Mun, Sheila Iyer, Ben Langmead

Posted 04 Mar 2020
bioRxiv DOI: 10.1101/2020.03.03.975219

Most sequencing data analyses start by aligning sequencing reads to a linear reference genome. But failure to account for genetic variation causes reference bias and confounding of results downstream. Other approaches replace the linear reference with structures like graphs that can include genetic variation, incurring major computational overhead. We propose the “reference flow” alignment method that uses information from multiple population reference genomes to improve alignment accuracy and reduce reference bias. Compared to the graph aligner vg, reference flow exhibits a similar level of accuracy and bias avoidance, but with 13% of the memory footprint and 6 times the speed.

Download data

  • Downloaded 954 times
  • Download rankings, all-time:
    • Site-wide: 12,248 out of 88,857
    • In bioinformatics: 1,973 out of 8,400
  • Year to date:
    • Site-wide: 1,490 out of 88,857
  • Since beginning of last month:
    • Site-wide: 5,127 out of 88,857

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)