Patterns of within-host genetic diversity in SARS-CoV-2
By
Gerry Tonkin-Hill,
Inigo Martincorena,
Roberto Amato,
Andrew R J Lawson,
Moritz Gerstung,
Ian Johnston,
David K. Jackson,
Naomi R Park,
Stefanie V Lensing,
Michael A. Quail,
Sónia Gonçalves,
Cristina Ariani,
Michael Spencer Chapman,
William L Hamilton,
Luke W. Meredith,
Grant Hall,
Aminu Jahun,
Yasmin Chaudhry,
Myra Hosmillo,
Malte L Pinckert,
Iliana Georgana,
Anna Yakovleva,
Laura G Caller,
Sarah L. Caddy,
Theresa Feltwell,
Fahad A Khokhar,
Charlotte Jane Houldcroft,
Martin D Curran,
Surendra Parmar,
The COVID-19 Genomics UK (COG-UK) Consortium,
Alex Alderton,
Rachel Nelson,
Ewan Harrison,
John Sillitoe,
Stephen D. Bentley,
Jeffrey C Barrett,
M. Estée Török,
Ian G. Goodfellow,
Cordelia Langford,
Dominic P. Kwiatkowski,
Wellcome Sanger Institute COVID-19 Surveillance Team
Posted 25 Dec 2020
bioRxiv DOI: 10.1101/2020.12.23.424229
Monitoring the spread of SARS-CoV-2 and reconstructing transmission chains has become a major public health focus for many governments around the world. The modest mutation rate and rapid transmission of SARS-CoV-2 prevents the reconstruction of transmission chains from consensus genome sequences, but within-host genetic diversity could theoretically help identify close contacts. Here we describe the patterns of within-host diversity in 1,181 SARS-CoV-2 samples sequenced to high depth in duplicate. 95% of samples show within-host mutations at detectable allele frequencies. Analyses of the mutational spectra revealed strong strand asymmetries suggestive of damage or RNA editing of the plus strand, rather than replication errors, dominating the accumulation of mutations during the SARS-CoV-2 pandemic. Within and between host diversity show strong purifying selection, particularly against nonsense mutations. Recurrent within-host mutations, many of which coincide with known phylogenetic homoplasies, display a spectrum and patterns of purifying selection more suggestive of mutational hotspots than recombination or convergent evolution. While allele frequencies suggest that most samples result from infection by a single lineage, we identify multiple putative examples of co-infection. Integrating these results into an epidemiological inference framework, we find that while sharing of within-host variants between samples could help the reconstruction of transmission chains, mutational hotspots and rare cases of superinfection can confound these analyses.
Download data
- Downloaded 1,573 times
- Download rankings, all-time:
- Site-wide: 10,780
- In genomics: 1,166
- Year to date:
- Site-wide: 1,497
- Since beginning of last month:
- Site-wide: 5,057
Altmetric data
Downloads over time
Distribution of downloads per paper, site-wide
PanLingua
News
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!