Rxivist logo

The Perfect Storm: Gene Tree Estimation Error, Incomplete Lineage Sorting, and Ancient Gene Flow Explain the Most Recalcitrant Ancient Angiosperm Clade, Malpighiales

By Liming Cai, Zhenxiang Xi, Emily Moriarty Lemmon, Alan R. Lemmon, Austin Mast, Christopher E. Buddenhagen, Liang Liu, Charles C. Davis

Posted 27 May 2020
bioRxiv DOI: 10.1101/2020.05.26.112318

The genomic revolution offers renewed hope of resolving rapid radiations in the Tree of Life. The development of the multispecies coalescent (MSC) model and improved gene tree estimation methods can better accommodate gene tree heterogeneity caused by incomplete lineage sorting (ILS) and gene tree estimation error stemming from the short internal branches. However, the relative influence of these factors in species tree inference is not well understood. Using anchored hybrid enrichment, we generated a data set including 423 single-copy loci from 64 taxa representing 39 families to infer the species tree of the flowering plant order Malpighiales. This order alone includes nine of the top ten most unstable nodes in angiosperms, and the recalcitrant relationships along the backbone of the order have been hypothesized to arise from the rapid radiation during the Cretaceous. Here, we show that coalescent-based methods do not resolve the backbone of Malpighiales and concatenation methods yield inconsistent estimations, providing evidence that gene tree heterogeneity is high in this clade. Despite high levels of ILS and gene tree estimation error, our simulations demonstrate that these two factors alone are insufficient to explain the lack of resolution in this order. To explore this further, we examined triplet frequencies among empirical gene trees and discovered some of them deviated significantly from those attributed to ILS and estimation error, suggesting gene flow as an additional and previously unappreciated phenomenon promoting gene tree variation in Malpighiales. Finally, we applied a novel method to quantify the relative contribution of these three primary sources of gene tree heterogeneity and demonstrated that ILS, gene tree estimation error, and gene flow contributed to 15%, 52%, and 32% of the variation, respectively. Together, our results suggest that a perfect storm of factors likely influence this lack of resolution, and further indicate that recalcitrant phylogenetic relationships like the backbone of Malpighiales may be better represented as phylogenetic networks. Thus, reducing such groups solely to existing models that adhere strictly to bifurcating trees greatly oversimplifies reality, and obscures our ability to more clearly discern the process of evolution. ### Competing Interest Statement The authors have declared no competing interest.

Download data

  • Downloaded 877 times
  • Download rankings, all-time:
    • Site-wide: 47,022
    • In evolutionary biology: 1,973
  • Year to date:
    • Site-wide: 79,909
  • Since beginning of last month:
    • Site-wide: 35,811

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide