Rxivist logo

Ancestry Inference Using Reference Labeled Clusters of Haplotypes

By Yong Wang, Shiya Song, Joshua G. Schraiber, Alisa Sedghifar, Jake K. Byrnes, David A Turissini, Eurie L. Hong, Catherine A Ball, Keith Noto

Posted 24 Sep 2020
bioRxiv DOI: 10.1101/2020.09.23.310698

We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual's ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry. The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and used to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations to 1,001 sections of a genotype using 10 CPU). We test ARCHes on public data from the 1,000 Genomes Project and HGDP as well as simulated examples of known admixture. Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at regional levels regardless of the amount of population admixture.

Download data

  • Downloaded 1,111 times
  • Download rankings, all-time:
    • Site-wide: 22,795
    • In genomics: 2,141
  • Year to date:
    • Site-wide: 11,754
  • Since beginning of last month:
    • Site-wide: 17,761

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide