Shedding light on the underlying characteristics of genomes using Kronecker model families of codon evolution

By Maryam Zaheri, Nicolas Salamin

Posted 13 Aug 2020
bioRxiv DOI: 10.1101/2020.08.12.247890

The mechanistic models of codon evolution rely on some simplistic assumptions in order to reduce the computational complexity of estimating the high number of parameters of the models. This paper is an attempt to investigate how much these simplistic assumptions are misleading when they violate the nature of the biological dataset in hand. We particularly focus on three simplistic assumptions made by most of the current mechanistic codon models including: 1) only single substitutions between nucleotides within codons in the codon transition rate matrix are allowed. 2) mutation is homogeneous across nucleotides within a codon. 3) assuming HKY nucleotide model is good enough at the nucleotide level. For this purpose, we developed a framework of mechanistic codon models, each model in the framework hold or relax some of the mentioned simplifying assumptions. Holding or relaxing the three simplistic assumptions results in total to eight different mechanistic models in the framework. Through several experiments on biological datasets and simulations we show that the three simplistic assumptions are unrealistic for most of the biological datasets and relaxing these assumptions lead to accurate estimation of evolutionary parameters such as selection pressure. ### Competing Interest Statement The authors have declared no competing interest.

