Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 84,323 bioRxiv papers from 363,094 authors.
Most downloaded bioRxiv papers, since beginning of last month
in category evolutionary biology
5,164 results found. For more information, click each entry to expand.
184,499 downloads evolutionary biology
Bette T. Korber, WM Fischer, S Gnanakaran, H Yoon, J Theiler, W Abfalterer, B Foley, EE Giorgi, Tanmoy Bhattacharya, MD Parker, DG Partridge, CM Evans, TM Freeman, Thushan I. de Silva, on behalf of the Sheffield COVID-19 Genomics Group, CC LaBranche, David Montefiori
We have developed an analysis pipeline to facilitate real-time mutation tracking in SARS-CoV-2, focusing initially on the Spike (S) protein because it mediates infection of human cells and is the target of most vaccine strategies and antibody-based therapeutics. To date we have identified fourteen mutations in Spike that are accumulating. Mutations are considered in a broader phylogenetic context, geographically, and over time, to provide an early warning system to reveal mutations that may confer selective advantages in transmission or resistance to interventions. Each one is evaluated for evidence of positive selection, and the implications of the mutation are explored through structural modeling. The mutation Spike D614G is of urgent concern; after beginning to spread in Europe in early February, when introduced to new regions it repeatedly and rapidly becomes the dominant form. Also, we present evidence of recombination between locally circulating strains, indicative of multiple strain infections. These finding have important implications for SARS-CoV-2 transmission, pathogenesis and immune interventions. ### Competing Interest Statement The authors have declared no competing interest.
132,079 downloads evolutionary biology
This paper has been withdrawn by its authors. They intend to revise it in response to comments received from the research community on their technical approach and their interpretation of the results. If you have any questions, please contact the corresponding author.
29,776 downloads evolutionary biology
Monitoring the mutation dynamics of SARS-CoV-2 is critical for the development of effective approaches to contain the pathogen. By analyzing 106 SARS-CoV-2 and 39 SARS genome sequences, we provided direct genetic evidence that SARS-CoV-2 has a much lower mutation rate than SARS. Minimum Evolution phylogeny analysis revealed the putative original status of SARS-CoV-2 and the early-stage spread history. The discrepant phylogenies for the spike protein and its receptor binding domain proved a previously reported structural rearrangement prior to the emergence of SARS-CoV-2. Despite that we found the spike glycoprotein of SARS-CoV-2 is particularly more conserved, we identified a mutation that leads to weaker receptor binding capability, which concerns a SARS-CoV-2 sample collected on 27th January 2020 from India. This represents the first report of a significant SARS-CoV-2 mutant, and raises the alarm that the ongoing vaccine development may become futile in future epidemic if more mutations were identified. ### Competing Interest Statement The authors have declared no competing interest.
10,924 downloads evolutionary biology
There are outstanding evolutionary questions on the recent emergence of coronavirus SARS-CoV-2/hCoV-19 in Hubei province that caused the COVID-19 pandemic, including (1) the relationship of the new virus to the SARS-related coronaviruses, (2) the role of bats as a reservoir species, (3) the potential role of other mammals in the emergence event, and (4) the role of recombination in viral emergence. Here, we address these questions and find that the sarbecoviruses -- the viral subgenus responsible for the emergence of SARS-CoV and SARS-CoV-2 -- exhibit frequent recombination, but the SARS-CoV-2 lineage itself is not a recombinant of any viruses detected to date. In order to employ phylogenetic methods to date the divergence events between SARS-CoV-2 and the bat sarbecovirus reservoir, recombinant regions of a 68-genome sarbecovirus alignment were removed with three independent methods. Bayesian evolutionary rate and divergence date estimates were consistent for all three recombination-free alignments and robust to two different prior specifications based on HCoV-OC43 and MERS-CoV evolutionary rates. Divergence dates between SARS-CoV-2 and the bat sarbecovirus reservoir were estimated as 1948 (95% HPD: 1879-1999), 1969 (95% HPD: 1930-2000), and 1982 (95% HPD: 1948-2009). Despite intensified characterization of sarbecoviruses since SARS, the lineage giving rise to SARS-CoV-2 has been circulating unnoticed for decades in bats and been transmitted to other hosts such as pangolins. The occurrence of a third significant coronavirus emergence in 17 years together with the high prevalence and virus diversity in bats implies that these viruses are likely to cross species boundaries again.
4,695 downloads evolutionary biology
The magnitude of the COVID-19 pandemic underscores the urgency for a safe and effective vaccine. Here we analyzed SARS-CoV-2 sequence diversity across 5,700 sequences sampled since December 2019. The Spike protein, which is the target immunogen of most vaccine candidates, showed 93 sites with shared polymorphisms; only one of these mutations was found in more than 1% of currently circulating sequences. The minimal diversity found among SARS-CoV-2 sequences can be explained by drift and bottleneck events as the virus spread away from its original epicenter in Wuhan, China. Importantly, there is little evidence that the virus has adapted to its human host since December 2019. Our findings suggest that a single vaccine should be efficacious against current global strains. ### Competing Interest Statement The authors have declared no competing interest.
2,563 downloads evolutionary biology
The recent outbreak of a new coronavirus (SARS-CoV-2) in Wuhan, China, underscores the need for understanding the evolutionary processes that drive the emergence and adaptation of zoonotic viruses in humans. Here, we show that recombination in betacoronaviruses, including human-infecting viruses like SARS-CoV and MERS-CoV, frequently encompasses the Receptor Binding Domain (RBD) in the Spike gene. We find that this common process likely led to a recombination event at least 11 years ago in an ancestor of the SARS-CoV-2 involving the RBD. As a result of this recombination event, SARS-CoV and SARS-CoV-2 share a similar genotype in RBD, including two insertions (positions 432-436 and 460-472), and alleles 427N and 436Y. Both 427N and 436Y belong to a helix that interacts with the human ACE2 receptor. Ancestral state analyses revealed that SARS-CoV-2 differentiated from its most recent common ancestor with RaTG13 by accumulating a significant number of amino acid changes in the RBD. In sum, we propose a two-hit scenario in the emergence of the SARS-CoV-2 virus whereby the SARS-CoV-2 ancestors in bats first acquired genetic characteristics of SARS-CoV by incorporation of a SARS-like RBD through recombination before 2009, and subsequently, the lineage that led to SARS-CoV-2 accumulated further, unique changes specifically in the RBD.
2,308 downloads evolutionary biology
Human severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is most closely related, by average genetic distance, to two coronaviruses isolated from bats, RaTG13 and RmYN02. However, there is a segment of high amino acid similarity between human SARS-CoV-2 and a pangolin isolated strain, GD410721, in the receptor binding domain (RBD) of the spike protein, a pattern that can be caused by either recombination or by convergent amino acid evolution driven by natural selection. We perform a detailed analysis of the synonymous divergence, which is less likely to be affected by selection than amino acid divergence, between human SARS-CoV-2 and related strains. We show that the synonymous divergence between the bat derived viruses and SARS-CoV-2 is larger than between GD410721 and SARS-CoV-2 in the RBD, providing strong additional support for the recombination hypothesis. However, the synonymous divergence between pangolin strain and SARS-CoV-2 is also relatively high, which is not consistent with a recent recombination between them, instead it suggests a recombination into RaTG13. We also find a 14-fold increase in the dN/dS ratio from the lineage leading to SARS-CoV-2 to the strains of the current pandemic, suggesting that the vast majority of non-synonymous mutations currently segregating within the human strains have a negative impact on viral fitness. Finally, we estimate that the time to the most recent common ancestor of SARS-CoV-2 and RaTG13 or RmYN02 based on synonymous divergence, is 51.71 years (95% C.I., 28.11-75.31) and 37.02 years (95% C.I., 18.19-55.85), respectively. ### Competing Interest Statement The authors have declared no competing interest.
1,668 downloads evolutionary biology
In a side-by-side comparison of evolutionary dynamics between the 2019/2020 SARS-CoV-2 and the 2003 SARS-CoV, we were surprised to find that SARS-CoV-2 resembles SARS-CoV in the late phase of the 2003 epidemic after SARS-CoV had developed several advantageous adaptations for human transmission. Our observations suggest that by the time SARS-CoV-2 was first detected in late 2019, it was already pre-adapted to human transmission to an extent similar to late epidemic SARS-CoV. However, no precursors or parallel branches of evolution stemming from a less human-adapted SARS-CoV-2-like virus have been detected. The sudden appearance of a highly infectious SARS-CoV-2 presents a major cause for concern that should motivate stronger international efforts to identify the source and prevent near future re-emergence. Any existing pools of SARS-CoV-2 progenitors would be particularly dangerous if similarly well adapted for human transmission. To look for clues regarding intermediate hosts, we analyze recent key findings relating to how SARS-CoV-2 could have evolved and adapted for human transmission, and examine the environmental samples from the Wuhan Huanan seafood market. Importantly, the market samples are genetically identical to human SARS-CoV-2 isolates and were therefore most likely from human sources. We conclude by describing and advocating for measured and effective approaches implemented in the 2002-2004 SARS outbreaks to identify lingering population(s) of progenitor virus. ### Competing Interest Statement Shing Hei Zhan is a Co-founder and lead bioinformatics scientist at Fusion Genomics Corporation, which develops molecular diagnostic assays for infectious diseases.
1,580 downloads evolutionary biology
In this study, we analyzed full-length SARS-CoV-2 genomes from multiple countries to determine early trends in the evolutionary dynamics of the novel COVID-19 pandemic. Results indicated SARS-CoV-2 evolved early into at least three phylogenetic groups, characterized by positive selection at specific residues of the accessory proteins OFR3a and ORF8a. We also report evidence of epistatic interactions among sites in the genome that may be important in the generation of variants adapted to humans. These observations might impact not only public health, but also suggest more studies are needed to understand the genetic mechanisms that may affect the development of therapeutic and preventive tools, like antivirals and vaccines. ### Competing Interest Statement The authors have declared no competing interest.
1,487 downloads evolutionary biology
The spread of SARS-CoV-2 since December 2019 has become a pandemic and impacted many aspects of human society. Here, we analyzed genetic variation of SARS-CoV-2 and its related coronavirus and found the evidence of intergenomic recombination. After correction for mutational bias, analysis of 137 SARS-CoV-2 genomes as of 2/23/2020 revealed the excess of low frequency mutations on both synonymous and nonsynonymous sites which is consistent with recent origin of the virus. In contrast to adaptive evolution previously reported for SARS-CoV in its brief epidemic in 2003, our analysis of SARS-CoV-2 genomes shows signs of relaxation of selection. The sequence similarity of the spike receptor binding domain between SARS-CoV-2 and a sequence from pangolin is probably due to an ancient intergenomic introgression. Therefore, SARS-CoV-2 might have cryptically circulated within humans for years before being recently noticed. Data from the early outbreak and hospital archives are needed to trace its evolutionary path and reveal critical steps required for effective spreading. Two mutations, 84S in orf8 protein and 251V in orf3 protein, occurred coincidentally with human intervention. The 84S first appeared on 1/5/2020 and reached a plateau around 1/23/2020, the lockdown of Wuhan. 251V emerged on 1/21/2020 and rapidly increased its frequency. Thus, the roles of these mutations on infectivity need to be elucidated. Genetic diversity of SARS-CoV-2 collected from China was two time higher than those derived from the rest of the world. In addition, in network analysis, haplotypes collected from Wuhan city were at interior and have more mutational connections, both of which are consistent with the observation that the outbreak of cov-19 was originated from China. ### Competing Interest Statement The authors have declared no competing interest.
1,472 downloads evolutionary biology
Identifying genomic regions with unusually high local haplotype homozygosity represents a powerful strategy to characterize candidate genes responding to natural or artificial positive selection. To that end, statistics measuring the extent of haplotype homozygosity within (e.g., EHH, IHS) and between (Rsb or XP-EHH) populations have been proposed in the literature. The rehh package for R was previously developed to facilitate genome-wide scans of selection, based on the analysis of long-range haplotypes. However, its performance wasn't sufficient to cope with the growing size of available data sets. Here we propose a major upgrade of the rehh package, which includes an improved processing of the input files, a faster algorithm to enumerate haplotypes, as well as multi-threading. As illustrated with the analysis of large human haplotype data sets, these improvements decrease the computation time by more than an order of magnitude. This new version of rehh will thus allow performing iHS-, Rsb- or XP-EHH-based scans on large data sets. The package rehh 2.0 is available from the CRAN repository (http://cran.r-project.org/web/packages/rehh/index.html) together with help files and a detailed manual.
1,447 downloads evolutionary biology
The current coronavirus disease 2019 (COVID-19) pandemic is caused by the SARS-CoV-2 virus and is still spreading rapidly worldwide. Full-genome-sequence computational analysis of the SARS-CoV-2 genome will allow us to understand the recent evolutionary events and adaptability mechanisms more accurately, as there is still neither effective therapeutic nor prophylactic strategy. In this study, we used population genetics analysis to infer the mutation rate and plausible recombination events that may have contributed to the evolution of the SARS-CoV-2 virus. Furthermore, we localized targets of recent and strong positive selection. The genomic regions that appear to be under positive selection are largely co-localized with regions in which recombination from non-human hosts appeared to have taken place in the past. Our results suggest that the pangolin coronavirus genome may have contributed to the SARS-CoV-2 genome by recombination with the bat coronavirus genome. However, we find evidence for additional recombination events that involve coronavirus genomes from other hosts, i.e., Hedgehog and Sparrow. Even though recombination events within human hosts cannot be directly assessed, due to the high similarity of SARS-CoV-2 genomes, we infer that recombinations may have recently occurred within human hosts using a linkage disequilibrium analysis. In addition, we employed an Approximate Bayesian Computation approach to estimate the parameters of a demographic scenario involving an exponential growth of the size of the SARS-CoV-2 populations that have infected European, Asian and Northern American cohorts, and we demonstrated that a rapid exponential growth in population size can support the observed polymorphism patterns in SARS-CoV-2 genomes. ### Competing Interest Statement The authors have declared no competing interest.
1,355 downloads evolutionary biology
Understanding the molecular features that made SARS-CoV-2 a highly infectious virus is crucial for the development of targeted therapies and an effective vaccine. Following the recent report of a furin-like cleavage site1, unique to SARS-CoV-2, we characterized several additional residues positioned within the crown of the S glycoprotein under positive/diversifying selection along the ancestral lineage of current pandemic strains. Residue V483 in near proximity to the ACE2 binding domain, which may indeed affect the enhanced ability of currently circulating strains to infect host cells. Molecular dynamic simulations revealed long-range covariant movements with furin cleavage, correlated with pre-fusion conformation of SARS-CoV-2 S glycoprotein monomer. Residue T333 is hinge of this movement that gives further advantage to bind cell receptors facilitating fusion and entry. Evolutionary analysis revealed that current lineage is likely to have emerged from the reservoir after a recombination event between bat and pangolin ancestors. Recombination impacted residues located in the binding receptor region of S glycoprotein that are identical to pangolin ancestor. Interestingly, several recombination hotspots were detected throughout the genome with the exception of the S glycoprotein, which exhibits a coldspot in the N terminal region, likely the result of negative selection against further changes that may reduce its optimal fitness.
1,200 downloads evolutionary biology
The SARS-CoV-2 pandemic has been growing exponentially, affecting nearly 900 thousand people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published, in scientific journals as well as through non-peer reviewed channels, to investigate SARS-CoV-2 genetic heterogeneity and spatiotemporal dissemination. We examined full genome sequences currently available to assess the presence of sufficient information for reliable phylogenetic and phylogeographic studies in countries with the highest toll of confirmed cases. Although number of-available full-genomes is growing daily, and the full dataset contains sufficient phylogenetic information that would allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 datasets still present severe limitations. Studies assessing within country spread or transmission clusters should be considered preliminary at best, or hypothesis generating. Hence the need for continuing concerted efforts to increase number and quality of the sequences required for robust tracing of the epidemic.
1,093 downloads evolutionary biology
Gene regulatory changes underlie much of phenotypic evolution. However, the evolutionary potential of regulatory evolution is unknown, because most evidence comes from either natural variation or limited experimental perturbations. Surveying an unbiased mutation library for a developmental enhancer in Drosophila melanogaster using an automated robotics pipeline, we found that most mutations alter gene expression. Our results suggest that regulatory information is distributed throughout most of a developmental enhancer and that parameters of gene expression: levels, location, and state, are convolved. The widespread pleiotropic effects of most mutations and the codependency of outputs may constrain the evolvability of developmental enhancers. Consistent with these observations, comparisons of diverse drosophilids reveal mainly stasis and apparent biases in the phenotypes influenced by this enhancer. Developmental enhancers may encode a much higher density of regulatory information than has been appreciated previously, which may impose constraints on regulatory evolution. ### Competing Interest Statement The authors have declared no competing interest.
1,071 downloads evolutionary biology
This study explores the divergence pattern of SARS-CoV-2 using whole genome sequences of the isolates from various COVID-19 affected countries. The phylogenomic analysis indicates the presence of at least four distinct groups of the SARS-CoV-2 genomes. The emergent groups have been found to be associated with signature structural changes in specific proteins. Also, this study reveals the differential levels of divergence patterns for the protein coding regions. Moreover, we have predicted the impact of structural changes on a couple of important viral proteins via structural modelling techniques. This study further advocates for more viral genetic studies with associated clinical outcomes and hosts response for better understanding of SARS-CoV-2 pathogenesis enabling better mitigation of this pandemic situation. ### Competing Interest Statement The authors have declared no competing interest.
1,068 downloads evolutionary biology
Simon Dellicour, Keith Durkin, Samuel L. Hong, Bert Vanmechelen, Joan Martí-Carreras, Mandev S. Gill, Cécile Meex, Sébastien Bontems, Emmanuel André, Marius Gilbert, Conor Walker, Nicola De Maio, James Hadfield, Marie-Pierre Hayette, Vincent Bours, Tony Wawina-Bokalanga, Maria Artesi, Guy Baele, Piet Maes
Since the start of the COVID-19 pandemic, an unprecedented number of genomic sequences of the causative virus (SARS-CoV-2) have been publicly released. The resulting volume of available genetic data presents a unique opportunity to gain real-time insights into the pandemic, but also a daunting computational hurdle if analysed with gold-standard phylogeographic methods. We here describe and apply an analytical pipeline that is a compromise between fast and rigorous analytical steps. As a proof of concept, we focus on Belgium, one of the countries with the highest spatial density of sequenced SARS-CoV-2 genomes. At the global scale, our analyses confirm the importance of external introduction events in establishing transmission chains in the country. At the country scale, our spatially-explicit phylogeographic analyses highlight an impact of the national lockdown of mid-March on the dispersal velocity of viral lineages. Our pipeline has the potential to be quickly applied to other countries or regions, with key benefits in complementing epidemiological analyses in assessing the impact of intervention measures or their progressive easement. ### Competing Interest Statement The authors have declared no competing interest.
976 downloads evolutionary biology
952 downloads evolutionary biology
A global cross-discipline effort is ongoing to characterize the evolution of SARS-CoV-2 virus and generate reliable epidemiological models of its diffusion. To this end, phylogenomic approaches leverage accumulating genomic mutations as barcodes to track the evolutionary history of the virus and can benefit from the surge of sequences deposited in public databases. Yet, such methods typically rely on consensus sequences representing the dominant virus lineage, whereas a complex sublineage architecture is often observed within single hosts. Furthermore, most approaches do not account for variants accumulation processes and might produce inaccurate results in condition of limited sampling, as witnessed in most countries currently affected by the epidemics. We here introduce a new framework for the characterization of viral (sub)lineage evolution and transmission of SARS-CoV-2, which considers both clonal and intra-host minor variants and exploits the achievements of cancer evolution research to account for mutation accumulation and uncertainty in the data. The application of our approach to 18 SARS-CoV-2 samples for which raw sequencing data are available reveals a high-resolution phylogenomic model, which confirms and improves recent findings on viral types and highlights the existence of patterns of co-occurrence of minor variants, uncovering likely infection paths among hosts harboring the same viral lineage. Our findings confirm a significant increase of genomic diversity of SARS-CoV-2 in time, which is reflected in minor variants, and show that standard methods may struggle when handling datasets with important sampling limitations. Importantly, our framework allows to pinpoint minor variants that might be positively selected across distinct lineages and regions of the viral genome under purifying selection, thus driving the design of treatments and vaccines. In particular, minor variant g.29039A>U, detected in multiple viral lineages and validated on an independent dataset, shows that SARS-CoV-2 can lose its main Nucleocapsid immunogenic epitopes, raising concerns about the effectiveness of vaccines targeting the C-terminus of this protein. To conclude, we advocate the use of our framework in combination with data-driven epidemiological models, to deliver a high-precision platform for pathogen detection, surveillance and analysis. ### Competing Interest Statement The authors have declared no competing interest.
949 downloads evolutionary biology
Coronavirus disease 2019 (COVID-19) is a global health concern as it continues to spread within China and beyond. The causative agent of this disease, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), belongs to the genus Betacoronavirus which also includes severe acute respiratory syndrome related coronavirus (SARSr-CoV) and Middle East respiratory syndrome related coronavirus (MERSr-CoV). Codon usage of viral genes are believed to be subjected to different selection pressures in different host environments. Previous studies on codon usage of influenza A viruses can help identify viral host origins and evolution trends, however, similar studies on coronaviruses are lacking. In this study, global correspondence analysis (CA), within-group correspondence analysis (WCA) and between-group correspondence analysis (BCA) were performed among different genes in coronavirus viral sequences. The amino acid usage pattern of SARS-CoV-2 was generally found similar to bat and human SARSr-CoVs. However, we found greater synonymous codon usage differences between SARS-CoV-2 and its phylogenetic relatives on spike and membrane genes, suggesting these two genes of SARS-CoV-2 are subjected to different evolutionary pressures.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!