Rxivist logo

Alignment-free methods for polyploid genomes: quick and reliable genetic distance estimation

By Acer VanWallendael, Mariano F. Alvarez

Posted 25 Oct 2020
bioRxiv DOI: 10.1101/2020.10.23.352963

Polyploid genomes pose several inherent challenges to population genetic analyses. While alignment-based methods are fundamentally limited in their applicability to polyploids, alignment-free methods bypass most of these limits. We investigated the use of Mash, a k-mer analysis tool that uses the MinHash method to reduce complexity in large genomic datasets, for basic population genetic analyses of polyploid sequences. We measured the degree to which Mash correctly estimated pairwise genetic distance in simulated diploid and polyploid short-read sequences with various levels of missing data. Mash-based estimates of genetic distance were comparable to alignment-based estimates, and were less impacted by missing data. We also used Mash to analyze publicly available short-read data for three polyploid and one diploid species, then compared Mash results to published results. For both simulated and real data, Mash accurately estimated pairwise genetic differences for polyploids as well as diploids as much as 476 times faster than alignment-based methods, though we found that Mash genetic distance estimates could be biased by per-sample read depth. Mash may be a particularly useful addition to the toolkit of polyploid geneticists for rapid confirmation of alignment-based results and for basic population genetics in reference-free systems with poor quality DNA. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 284 times
  • Download rankings, all-time:
    • Site-wide: 101,316
    • In genetics: 4,317
  • Year to date:
    • Site-wide: 55,667
  • Since beginning of last month:
    • Site-wide: 31,835

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide