Bacteria can exchange and acquire new genetic material from other organisms directly and via the environment. This process, known as bacterial recombination, has a strong impact on the evolution of bacteria, for example leading to the spread of antibiotic resistance across clades and species, and to the avoidance of clonal interference. Recombination hinders phylogenetic and transmission inference because it creates patterns of substitutions that are not consistent with the hypothesis of a single evolutionary tree (homoplasies). Bacterial recombination is typically modelled as statistically akin to the gene conversion process of eukaryotes, i.e., using the coalescent with gene conversion (CGC). However, this model can be very computationally demanding as it requires to account for the correlations of evolutionary histories of even distant loci. So, with the increasing popularity of whole genome sequencing, the need has emerged for a new and faster approach to model and simulate bacterial evolution at genomic scales. We present a new model that approximates the coalescent with gene conversion: the bacterial sequential Markov coalescent (BSMC). Our approach is based on a similar idea to the the sequential Markov coalescent (SMC), an approximation of the coalescent with recombination. However, bacterial recombination poses hurdles to a sequential Markov approximation, as it leads to strong correlations and linkage disequilibrium across very distant sites in the genome. Our BSMC overcomes these difficulties and shows both a considerable reduction in computational demand compared the exact CGC, and very similar patterns in the simulated data. We use the BSMC within an Approximate Bayesian Computation (ABC) inference scheme and show that we can correctly recover parameters simulated under the exact CGC, which further showcases the accuracy of our approximation. We also use this ABC approach to infer recombination rate, mutation rate, and recombination tract length from a whole genome alignment of Bacillus cereus. Lastly, we implemented our BSMC model within a new simulation software FastSimBac. In addition to the decreased computational demand compared to previous bacterial genome evolution simulators, FastSimBac also provides a much more general set of options for evolutionary scenarios, allowing population structure with migration, speciations, population size changes, and recombination hotspots. FastSimBac is available from https://bitbucket.org/nicofmay/fastsimbac and is distributed as open source under the terms of the GNU General Public Licence.
- Downloaded 443 times
- Download rankings, all-time:
- Site-wide: 75,348
- In evolutionary biology: 3,850
- Year to date:
- Site-wide: 154,635
- Since beginning of last month:
- Site-wide: 153,846
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!