Rxivist logo

Chromosome-scale inference of hybrid speciation and admixture with convolutional neural networks

By Paul D Blischak, Michael S. Barker, Ryan N Gutenkunst

Posted 29 Jun 2020
bioRxiv DOI: 10.1101/2020.06.29.159673

Inferring the frequency and mode of hybridization among closely related organisms is an important step for understanding the process of speciation and can help to uncover reticulated patterns of phylogeny more generally. Phylogenomic methods to test for the presence of hybridization come in many varieties and typically operate by leveraging expected patterns of genealogical discordance in the absence of hybridization. An important assumption made by these tests is that the data (genes or SNPs) are independent given the species tree. However, when the data are closely linked, it is especially important to consider their non-independence. Recently, deep learning techniques such as convolutional neural networks (CNNs) have been used to perform population genetic inferences with linked SNPs coded as binary images. Here we use CNNs for selecting among candidate hybridization scenarios using the tree topology (((P1, P2), P3), Out) and a matrix of pairwise nucleotide divergence (dXY) calculated in windows across the genome. Using coalescent simulations to train and independently test a neural network showed that our method, HyDe-CNN, was able to accurately perform model selection for hybridization scenarios across a wide-breath of parameter space. We then used HyDe-CNN to test models of admixture in Heliconius butterflies, as well as comparing it to a random forest classifier trained on introgression-based statistics. Given the flexibility of our approach, the dropping cost of long-read sequencing, and the continued improvement of CNN architectures, we anticipate that inferences of hybridization using deep learning methods like ours will help researchers to better understand patterns of admixture in their study organisms. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 723 times
  • Download rankings, all-time:
    • Site-wide: 45,754
    • In evolutionary biology: 2,123
  • Year to date:
    • Site-wide: 53,308
  • Since beginning of last month:
    • Site-wide: 48,845

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide