Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 83,609 bioRxiv papers from 360,279 authors.
Most downloaded bioRxiv papers, since beginning of last month
in category bioinformatics
7,862 results found. For more information, click each entry to expand.
14,124 downloads bioinformatics
Christoph Muus, Malte D Luecken, Gokcen Eraslan, Avinash Waghray, Graham Heimberg, Lisa Sikkema, Yoshihiko Kobayashi, Eeshit Dhaval Vaishnav, Ayshwarya Subramanian, Christopher Smilie, Karthik Jagadeesh, Elizabeth Thu Duong, Evgenij Fiskin, Elena Torlai Triglia, Meshal Ansari, Peiwen Cai, Brian Lin, Justin Buchanan, Sijia Chen, Jian Shu, Adam L. Haber, Hattie Chung, Daniel T Montoro, Taylor Adams, Hananeh Aliee, J. Samuel, Allon Zaneta Andrusivova, Ilias Angelidis, Orr Ashenberg, Kevin Bassler, Christophe Bécavin, Inbal Benhar, Joseph Bergenstråhle, Ludvig Bergenstråhle, Liam Bolt, Emelie Braun, Linh T Bui, Mark Chaffin, Evgeny Chichelnitskiy, Joshua Chiou, Thomas M Conlon, Michael S Cuoco, Marie Deprez, David S. Fischer, Astrid Gillich, Joshua Gould, Minzhe Guo, Austin J Gutierrez, Arun C Habermann, Tyler Harvey, Peng He, Xiaomeng Hou, Lijuan Hu, Alok Jaiswal, Peiyong Jiang, Theodoros Kapellos, Christin S Kuo, Ludvig Larsson, Michael A. Leney-Greene, Kyungtae Lim, Monika Litviňuková, Ji Lu, Leif S Ludwig, Wendy Luo, Henrike Maatz, Elo Madissoon, Lira Mamanova, Kasidet Manakongtreecheep, Charles-Hugo Marquette, Ian Mbano, Alexi Marie McAdams, Ross J Metzger, Ahmad N. Nabhan, Sarah K. Nyquist, Lolita Penland, Olivier B. Poirion, Sergio Poli, CanCan Qi, Rachel Queen, Daniel Reichart, Ivan Rosas, Jonas Schupp, Rahul Sinha, Rene V Sit, Kamil Slowikowski, Michal Slyper, Neal Smith, Alex Sountoulidis, Maximilian Strunz, Dawei Sun, Carlos Talavera-López, Peng Tan, Jessica Tantivit, Kyle J. Travaglini, Nathan R. Tucker, Katherine Vernon, Marc H Wadsworth, Julia Waldman, Xiuting Wang, Wenjun Yan, William Zhao, Carly G. K. Ziegler, The NHLBI LungMAP Consortium, The Human Cell Atlas Lung Biological Network
The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2, creates an urgent need for identifying molecular mechanisms that mediate viral entry, propagation, and tissue pathology. Cell membrane bound angiotensin-converting enzyme 2 (ACE2) and associated proteases, transmembrane protease serine 2 (TMPRSS2) and Cathepsin L (CTSL), were previously identified as mediators of SARS-CoV2 cellular entry. Here, we assess the cell type-specific RNA expression of ACE2, TMPRSS2, and CTSL through an integrated analysis of 107 single-cell and single-nucleus RNA-Seq studies, including 22 lung and airways datasets (16 unpublished), and 85 datasets from other diverse organs. Joint expression of ACE2 and the accessory proteases identifies specific subsets of respiratory epithelial cells as putative targets of viral infection in the nasal passages, airways, and alveoli. Cells that co-express ACE2 and proteases are also identified in cells from other organs, some of which have been associated with COVID-19 transmission or pathology, including gut enterocytes, corneal epithelial cells, cardiomyocytes, heart pericytes, olfactory sustentacular cells, and renal epithelial cells. Performing the first meta-analyses of scRNA-seq studies, we analyzed 1,176,683 cells from 282 nasal, airway, and lung parenchyma samples from 164 donors spanning fetal, childhood, adult, and elderly age groups, associate increased levels of ACE2, TMPRSS2, and CTSL in specific cell types with increasing age, male gender, and smoking, all of which are epidemiologically linked to COVID-19 susceptibility and outcomes. Notably, there was a particularly low expression of ACE2 in the few young pediatric samples in the analysis. Further analysis reveals a gene expression program shared by ACE2+TMPRSS2+ cells in nasal, lung and gut tissues, including genes that may mediate viral entry, subtend key immune functions, and mediate epithelial-macrophage cross-talk. Amongst these are IL6, its receptor and co-receptor, IL1R, TNF response pathways, and complement genes. Cell type specificity in the lung and airways and smoking effects were conserved in mice. Our analyses suggest that differences in the cell type-specific expression of mediators of SARS-CoV-2 viral entry may be responsible for aspects of COVID-19 epidemiology and clinical course, and point to putative molecular pathways involved in disease susceptibility and pathogenesis. ### Competing Interest Statement N.K. was a consultant to Biogen Idec, Boehringer Ingelheim, Third Rock, Pliant, Samumed, NuMedii, Indaloo, Theravance, LifeMax, Three Lake Partners, Optikira and received non-financial support from MiRagen. All of these outside the work reported. J.L. is a scientific consultant for 10X Genomics Inc A.R. is a co-founder and equity holder of Celsius Therapeutics, an equity holder in Immunitas, and an SAB member of ThermoFisher Scientific, Syros Pharmaceuticals, Asimov, and Neogene Therapeutics O.R.R., is a co-inventor on patent applications filed by the Broad Institute to inventions relating to single cell genomics applications, such as in PCT/US2018/060860 and US Provisional Application No. 62/745,259. A.K.S. compensation for consulting and SAB membership from Honeycomb Biotechnologies, Cellarity, Cogen Therapeutics, Orche Bio, and Dahlia Biosciences. S.A.T. was a consultant at Genentech, Biogen and Roche in the last three years. F.J.T. reports receiving consulting fees from Roche Diagnostics GmbH, and ownership interest in Cellarity Inc. L.V. is funder of Definigen and Bilitech two biotech companies using hPSCs and organoid for disease modelling and cell based therapy.
11,606 downloads bioinformatics
The ongoing pandemic of the coronavirus disease 2019 (COVID-19) is an infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV2). We have performed an integrated sequence-based analysis of SARS-CoV2 genomes from different geographical locations in order to identify its unique features absent in SARS-CoV and other related coronavirus family genomes, conferring unique infection, facilitation of transmission, virulence and immunogenic features to the virus. The phylogeny of the genomes yields some interesting results. Systematic gene level mutational analysis of the genomes has enabled us to identify several unique features of the SARS-CoV2 genome, which includes a unique mutation in the spike surface glycoprotein (A930V (24351C>T)) in the Indian SARS-CoV2, absent in other strains studied here. We have also predicted the impact of the mutations in the spike glycoprotein function and stability, using computational approach. To gain further insights into host responses to viral infection, we predict that antiviral host-miRNAs may be controlling the viral pathogenesis. Our analysis reveals nine host miRNAs which can potentially target SARS-CoV2 genes. Interestingly, the nine miRNAs do not have targets in SARS and MERS genomes. Also, hsa-miR-27b is the only unique miRNA which has a target gene in the Indian SARS-CoV2 genome. We also predicted immune epitopes in the genomes.
8,926 downloads bioinformatics
A novel coronavirus SARS-CoV-2 was identified in Wuhan, Hubei Province, China in December of 2019. According to WHO report, this new coronavirus has resulted in 76,392 confirmed infections and 2,348 deaths in China by 22 February, 2020, with additional patients being identified in a rapidly growing number internationally. SARS-CoV-2 was reported to share the same receptor, Angiotensin-converting enzyme 2 (ACE2), with SARS-CoV. Here based on the public database and the state-of-the-art single-cell RNA-Seq technique, we analyzed the ACE2 RNA expression profile in the normal human lungs. The result indicates that the ACE2 virus receptor expression is concentrated in a small population of type II alveolar cells (AT2). Surprisingly, we found that this population of ACE2-expressing AT2 also highly expressed many other genes that positively regulating viral entry, reproduction and transmission. This study provides a biological background for the epidemic investigation of the COVID-19, and could be informative for future anti-ACE2 therapeutic strategy development. ### Competing Interest Statement The authors have declared no competing interest.
3,092 downloads bioinformatics
The World Health Organization characterized the COVID-19 as a pandemic in March 2020, the second pandemic of the 21st century. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a positive-stranded RNA betacoronavirus of the family Coronaviridae. Expanding virus populations, as that of SARS-CoV-2, accumulate a number of narrowly shared polymorphisms imposing a confounding effect on traditional clustering methods. In this context, approaches that reduce the complexity of the sequence space occupied by the SARS-CoV-2 population are necessary for a robust clustering. Here, we proposed the subdivision of the global SARS-CoV-2 population into sixteen well-defined subtypes by focusing on the widely shared polymorphisms in nonstructural (nsp3, nsp4, nsp6, nsp12, nsp13 and nsp14) cistrons, structural (spike and nucleocapsid) and accessory (ORF8) genes. Six virus subtypes were predominant in the population, but all sixteen showed amino acid replacements which might have phenotypic implications. We hypothesize that the virus subtypes detected in this study are records of the early stages of the SARS-CoV-2 diversification that were randomly sampled to compose the virus populations around the world, a typical founder effect. The genetic structure determined for the SARS-CoV-2 population provides substantial guidelines for maximizing the effectiveness of trials for testing the candidate vaccines or drugs. ### Competing Interest Statement The authors have declared no competing interest.
3,016 downloads bioinformatics
Background: COVID-19 is a disease with global public health emergency that have shook the world since its first detection in China in December, 2019. Severe acute respiratory syndrome Coronavirus 2 (SARS-CoV-2) is the pathogen responsible behind this pandemic. The lethality of different viral strains is found to vary in different geographical locations but the molecular mechanism is yet to be known. Methods: Available data of whole genome sequencing of different viral strains published by different countries were retrieved and then analysed using Multiple Sequence Alignment and Pair-wise Sequence Alignment leading to Phylogenetic tree construction. Each location and the corresponding genetic variations were screened in depth. Then the variations are analysed at protein level giving special emphasis on Non Synonymous amino acid substitutions. The fatality rates in different countries were matched against the mutation number, rarity of the nucleotide alterations and functional impact of the Non Synonymous changes at protein level, separately and in combination. Results: All the viral strains have been found to evolve from the viral strain of Taiwan (MT192759) which is 100% identical with the ancestor SARS-CoV-2 sequences of Wuhan (NC 045512.2; submitted on 5th Jan, 2020). Transition from C to T (C>T) is the most frequent mutation in this viral genome and mutations A>T, G>A, T>A are the rarest ones, found in countries with maximum fatality rate i.e Italy, Spain and Sweden. 20 Non Synonymous mutations are located in viral genome spanning Orf1ab polyprotein, Surface glycoprotein, Nucleocapsid protein etc. The functional effect on the structure and function of the protein can favourably or unfavourably interact with the host body. Interpretation: The fatality outcome depends on three important factors (a) number of mutation (b) rarity of the allelic variation and (c) functional consequence of the mutation at protein level. The molecular divergence, evolved from the ancestral strain (S) lead to extremely lethal (E), lethal (L) and non lethal (N) strains with the involvement of an Intermediate strain (I). ### Competing Interest Statement
2,830 downloads bioinformatics
Starting from December 2019, a novel coronavirus, later named 2019-nCoV, was found to cause severe and rapid pandemic in China. Basing on the structural information, we have predicted a list of commercial medicines which may function as inhibitors for 2019-nCoV by targeting its main protease Mpro. These drugs may also be effective for other coronaviruses with similar Mpro binding sites and pocket structures.
2,329 downloads bioinformatics
To ultimately combat the emerging COVID-19 pandemic, it is desired to develop an effective and safe vaccine against this highly contagious disease caused by the SARS-CoV-2 coronavirus. Our literature and clinical trial survey showed that the whole virus, as well as the spike (S) protein, nucleocapsid (N) protein, and membrane (M) protein, have been tested for vaccine development against SARS and MERS. However, these vaccine candidates might lack the induction of complete protection and have safety concerns. We then applied the Vaxign reverse vaccinology tool and the newly developed Vaxign-ML machine learning tool to predict COVID-19 vaccine candidates. By investigating the entire proteome of SARS-CoV-2, six proteins, including the S protein and five non-structural proteins (nsp3, 3CL-pro, and nsp8-10), were predicted to be adhesins, which are crucial to the viral adhering and host invasion. The S, nsp3, and nsp8 proteins were also predicted by Vaxign-ML to induce high protective antigenicity. Besides the commonly used S protein, the nsp3 protein has not been tested in any coronavirus vaccine studies and was selected for further investigation. The nsp3 was found to be more conserved among SARS-CoV-2, SARS-CoV, and MERS-CoV than among 15 coronaviruses infecting human and other animals. The protein was also predicted to contain promiscuous MHC-I and MHC-II T-cell epitopes, and linear B-cell epitopes localized in specific locations and functional domains of the protein. By applying reverse vaccinology and machine learning, we predicted potential vaccine targets for effective and safe COVID-19 vaccine development. We then propose that an “Sp/Nsp cocktail vaccine” containing a structural protein(s) (Sp) and a non-structural protein(s) (Nsp) would stimulate effective complementary immune responses.
2,302 downloads bioinformatics
Single-cell RNA-seq technologies have been successfully employed over the past decade to generate many high resolution cell atlases. These have proved invaluable in recent efforts aimed at understanding the cell type specificity of host genes involved in SARS-CoV-2 infections. While single-cell atlases are based on well-sampled highly-expressed genes, many of the genes of interest for understanding SARS-CoV-2 can be expressed at very low levels. Common assumptions underlying standard single-cell analyses don't hold when examining low-expressed genes, with the result that standard workflows can produce misleading results. ### Competing Interest Statement The authors have declared no competing interest.
1,993 downloads bioinformatics
Recently emerged coronavirus designated as SARS-CoV-2 (also known as 2019 novel coronavirus (2019-nCoV) or Wuhan coronavirus) is a causative agent of coronavirus disease 2019 (COVID-19), which is rapidly spreading throughout the world now. More than 9,00,000 cases of SARS-CoV-2 infection and more than 47,000 COVID-19-associated mortalities have been reported worldwide till the writing of this article, and these numbers are increasing every passing hour. World Health Organization (WHO) has declared the SARS-CoV-2 spread as a global public health emergency and admitted that the COVID-19 is a pandemic now. The multiple sequence alignment data correlated with the already published reports on the SARS-CoV-2 evolution and indicated that this virus is closely related to the bat Severe Acute Respiratory Syndrome-like coronavirus (bat SARS-like CoV) and the well-studied Human SARS coronavirus (SARS CoV). The disordered regions in viral proteins are associated with the viral infectivity and pathogenicity. Therefore, in this study, we have exploited a set of complementary computational approaches to examine the dark proteomes of SARS-CoV-2, bat SARS-like, and human SARS CoVs by analysing the prevalence of intrinsic disorder in their proteins. According to our findings, SARS-CoV-2 proteome contains very significant levels of structural order. In fact, except for Nucleocapsid, Nsp8, and ORF6, the vast majority of SARS-CoV-2 proteins are mostly ordered proteins containing less intrinsically disordered protein regions (IDPRs). However, IDPRs found in SARS-CoV-2 proteins are functionally important. For example, cleavage sites in its replicase 1ab polyprotein are found to be highly disordered, and almost all SARS-CoV-2 proteins were shown to contain molecular recognition features (MoRFs), which are intrinsic disorder-based protein-protein interaction sites that are commonly utilized by proteins for interaction with specific partners. The results of our extensive investigation of the dark side of the SARS-CoV-2 proteome will have important implications for the structural and non-structural biology of SARS or SARS-like coronaviruses. Significance The infection caused by a novel coronavirus (SARS-CoV-2) that causes severe respiratory disease with pneumonia-like symptoms in humans is responsible for the current COVID-19 pandemic. No in-depth information on structures and functions of SARS-CoV-2 proteins is currently available in the public domain, and no effective anti-viral drugs and/or vaccines are designed for the treatment of this infection. Our study provides the first comparative analysis of the order- and disorder-based features of the SARS-CoV-2 proteome relative to human SARS and bat CoV that may be useful for structure-based drug discovery. * ACE2 : Angiotensin-converting enzyme 2 CDF : Cumulative distribution function CH : Charge hydropathy COVID-19 : Coronavirus disease 2019 CTD : Cterminal domain DMVs : Double-membrane vesicles ICTV : International committee on taxonomy of viruses IDP : Intrinsically disordered proteins IDPRs : Intrinsically disordered protein regions IFN : Interferon MoRFs : Molecular recognition features MSA : Multiple sequence alignment Nsps : Non-structural proteins NTD : N-terminal domain PONDR : Predictor of natural disordered regions PPID : Predicted percentage of intrinsic disorder Pprint : Prediction of Protein RNA-Interaction RBD : Receptor binding domain SARS : Severe acute respiratory syndrome TRS : Transcriptional regulatory sequences VLPs : Virus-like particles WHO : World health organization
1,889 downloads bioinformatics
The COVID-19 outbreak has become a global health risk and understanding the response of the host to the SARS-CoV-2 virus will help to contrast the disease. Editing by host deaminases is an innate restriction process to counter viruses, and it is not yet known whether it operates against Coronaviruses. Here we analyze RNA sequences from bronchoalveolar lavage fluids derived from infected patients. We identify nucleotide changes that may be signatures of RNA editing: Adenosine-to-Inosine changes from ADAR deaminases and Cytosine-to-Uracil changes from APOBEC ones. A mutational analysis of genomes from different strains of human-hosted Coronaviridae reveals mutational patterns compatible to those observed in the transcriptomic data. Our results thus suggest that both APOBECs and ADARs are involved in Coronavirus genome editing, a process that may shape the fate of both virus and patient. ### Competing Interest Statement The authors have declared no competing interest.
1,804 downloads bioinformatics
The authors have withdrawn their manuscript whilst they wish to perform additional experiments to validate their conclusions further. Therefore, the authors do not wish this work to be cited as reference for the project. If you have any questions, please contact the corresponding author for more details. ### Competing Interest Statement The authors have declared no competing interest.
1,646 downloads bioinformatics
As recently classified as a pandemic by WHO, novel Corononavirus 2019 has affected almost every corner of the globe causing human deaths in a range of hundred thousands. The virus having its roots in Wuhan (China) has been spread over the world by its own property to change itself accordingly. These changes correspond to its transmission and pathogenicity due to which the concept of social distancing appeared into the picture. In this paper, a few findings from the whole genome sequence analysis of viral genome sequences submitted from India are presented. The data used for analysis comprises 440 collective genome sequences of virus submitted in GenBank, GISAID, and SRA projects, from around the world as well as 28 viral sequences from India. Multiple sequence alignment of all genome sequences was performed and analysed. A novel non-synonymous mutation 4809C>T (S1515F) in NSP3 gene of SARS-CoV2 Indian strains is reported along with other frequent and important changes from around the world: 3037C>T, 14408C>T, and 23403A>G. The novel change was observed in samples collected in the month of March, whereas was found to be absent in samples collected in January with the respective persons travel history to China. Phylogenetic analysis clustered the sequences with this change as one separate clade. Mutation was predicted as stabilising change by insilco tool DynaMut. A second patient in the world to our knowledge with multiple (Wuhan and USA) strain contraction was observed in this study. The infected person is among the two early infected patients with travel history to China. Strains sequenced in Iran stood out to have different variants, as most of the reported frequent variants were not observed. The objective of this paper is to highlight the similarities and changes observed in the submitted Indian viral strains. This helps to keep track on the activity, that how virus is changing into a new subtype. Major strains observed were European with the novel change in India and other being emergent clade of Iran. Its important to observe the changes in NSP3 gene, as this gene has been reported with extensive positive selection as well as potential drug target. Extensive Positive Selection Drives the Evolution of Nonstructural Proteins. With the limited number of sequences this was the only frequent novel non-synonymous change observed from Indian strains, thereby making this change vulnerable for investigation in future. This paper has a special focus on tracking of Indian viral sequences submitted in public domain. ### Competing Interest Statement The authors have declared no competing interest.
1,633 downloads bioinformatics
The past few weeks have witnessed a worldwide mobilization of the research community in response to the novel coronavirus (COVID-19). This global response has led to a burst of publications on the pathophysiology of the virus, yet without coordinated efforts to organize this knowledge, it can remain hidden away from individual research groups. By extracting and formalizing this knowledge in a structured and computable form, as in the form of a knowledge graph, researchers can readily reason and analyze this information on a much larger scale. Here, we present the COVID-19 Knowledge Graph, an expansive cause-and-effect network constructed from scientific literature on the new coronavirus that aims to provide a comprehensive view of its pathophysiology. To make this resource available to the research community and facilitate its exploration and analysis, we also implemented a web application and released the KG in multiple standard formats. ### Competing Interest Statement The authors have declared no competing interest.
1,509 downloads bioinformatics
Specific elements of viral genomes regulate interactions within host cells. Here, we calculated the secondary structure content of >2500 coronaviruses and computed >100000 human protein interactions with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). We found that the 3 and 5 prime ends are the most structured elements in the viral genome and the 5 prime end has the strongest propensity to associate with human proteins. The domain encompassing nucleotides 23000-24000 is highly conserved both at the sequence and structural level, while the region upstream varies significantly. These two sequences code for a domain of the viral protein Spike S that interacts with the human receptor angiotensin-converting enzyme 2 (ACE2) and has the potential to bind sialic acids. Our predictions indicate that the first 1000 nucleotides in the 5 prime end can interact with proteins involved in viral RNA processing such as double-stranded RNA specific editases and ATP-dependent RNA-helicases, in addition to other high-confidence candidate partners. These interactions, previously reported to be also implicated in HIV, reveal important information on host-virus interactions. The list of transcriptional and post-transcriptional elements recruited by SARS-CoV-2 genome provides clues on the biological pathways associated with gene expression changes in human cells. ### Competing Interest Statement The authors have declared no competing interest.
1,502 downloads bioinformatics
The outbreak of COVID-19 has now become a global pandemic and it continues to spread rapidly worldwide, severely threatening lives and economic stability. Making the problem worse, there is no specific antiviral drug that can be used to treat COVID-19 to date. SARS-CoV-2 initiates its entry into human cells by binding to angiotensin-converting enzyme 2 (hACE2) via the receptor binding domain (RBD) of its spike protein. Therefore, molecules that can block SARS-CoV-2 from binding to hACE2 may potentially prevent the virus from entering human cells and serve as an effective antiviral drug. Based on this idea, we designed a series of peptides that can strongly bind to SARS-CoV-2 RBD in computational experiments. Specifically, we first constructed a 31-mer peptidic scaffold by linking two fragments grafted from hACE2 (a.a. 22-44 and 351-357) with a linker glycine, and then redesigned the peptide sequence to enhance its binding affinity to SARS-CoV-2 RBD. Compare with several computational studies that failed to identify that SARS-CoV-2 shows higher binding affinity for hACE2 than SARS-CoV, our protein design scoring function, EvoEF2, makes a correct identification, which is consistent with the recently reported experimental data, implying its high accuracy. The top designed peptide binders exhibited much stronger binding potency to hACE2 than the wild-type (-53.35 vs. -46.46 EvoEF2 energy unit for design and wild-type, respectively). The extensive and detailed computational analyses support the high reasonability of the designed binders, which not only recapitulated the critical native binding interactions but also introduced new favorable interactions to enhance binding. Due to the urgent situation created by COVID-19, we share these computational data to the community, which should be helpful to develop potential antiviral peptide drugs to combat this pandemic.
1,482 downloads bioinformatics
The global population is at present suffering from a pandemic of Coronavirus disease 2019 (COVID-19), caused by the novel coronavirus Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). The goals of this study were to use artificial intelligence (AI) to predict blueprints for designing universal vaccines against SARS-CoV-2, that contain a sufficiently broad repertoire of T-cell epitopes capable of providing coverage and protection across the global population. To help achieve these aims, we profiled the entire SARS-CoV-2 proteome across the most frequent 100 HLA-A, HLA-B and HLA-DR alleles in the human population, using host-infected cell surface antigen presentation and immunogenicity predictors from the NEC Immune Profiler suite of tools, and generated comprehensive epitope maps. We then used these epitope maps as input for a Monte Carlo simulation designed to identify statistically significant epitope hotspot regions in the virus that are most likely to be immunogenic across a broad spectrum of HLA types. We then removed epitope hotspots that shared significant homology with proteins in the human proteome to reduce the chance of inducing off-target autoimmune responses. We also analyzed the antigen presentation and immunogenic landscape of all the nonsynonymous mutations across 3400 different sequences of the virus, to identify a trend whereby SARS-COV-2 mutations are predicted to have reduced potential to be presented by host-infected cells, and consequently detected by the host immune system. A sequence conservation analysis then removed epitope hotspots that occurred in less-conserved regions of the viral proteome. Finally, we used a database of the HLA genotypes of approximately 22 000 individuals to develop a digital twin type simulation to model how effective different combinations of hotspots would work in a diverse human population, and used the approach to identify an optimal constellation of epitopes hotspots that could provide maximum coverage in the global population. By combining the antigen presentation to the infected-host cell surface and immunogenicity predictions of the NEC Immune Profiler with a robust Monte Carlo and digital twin simulation, we have managed to profile the entire SARS-CoV-2 proteome and identify a subset of epitope hotspots that could be harnessed in a vaccine formulation to provide a broad coverage across the global population. ### Competing Interest Statement BS, CM, MG, HF, IV, ST, JM, RS and TC are employees of NEC OncoImmunity, a subsidiary of NEC Corporation. BM and JC are employees of NEC Laboratories Europe.
1,461 downloads bioinformatics
COVID-19 pandemic has resulted so far 14,395,16 confirmed cases with 85,711 deaths from the 212 countries, or territories. Due to multifacet issues and challenges in implementation of the safety & preventive measures, inconsistent coordination between societies-governments and most importanly lack of specific vaccine to SARS-CoV-2, the spread of Wuhan originated virus is still uprising after taking a heavy toll on human life. In the present study, we mapped several immunogenic epitopes (B-cell, T-cell, and IFN-gamma) over the entire structural proteins of SARS-CoV-2 and by applying various computational and immunoinformatics approaches, we designed a multi-epitope peptide based vaccine that predicted high immunogenic response in the largest proportion of world's human population. To ensure high expression of the recombinant vaccine in E. coli, codon optimization and in-silico cloning were also carried out. The designed vaccine with high molecular affinity to TLR3 and TLR4, was found capable to initiate effective innate and adaptive immune response. The immune simulation also suggested uprising high levels of both B-cell and T-cell mediated immunity which on subsequent exposure cleared antigen from the system. The proposed vaccine found promising by yielding desired results and hence, should be tested by practical experimentations for its functioning and efficacy to neutralize SARS-CoV-2. ### Competing Interest Statement The authors have declared no competing interest.
1,330 downloads bioinformatics
The SARS-CoV-2 virus has infected more than one million people worldwide to date. Knowing its genome and gene expressions is essential to understand the virus' mechanism. Here, we propose a computational tool CovProfile to detect the viral genomic variations as well as viral gene expressions from the sequences obtained from Nanopore devices. We applied CovProfile to 11 samples, each from a terminally ill patient, and discovered that all the patients are infected by multiple viral strains, which might affect the reliability of phylogenetic analysis. Moreover, the expression of viral genes ORF1ab gene, S gene, M gene, and N gene are high among most of the samples. While performing the tests, we noticed a consistent abundance of transcript segments of MUC5B, presumably from the host, across all the samples. ### Competing Interest Statement The authors have declared no competing interest.
1,318 downloads bioinformatics
The spread of the COVID-19 caused by the SARS-CoV-2 outbreak has been growing since its first identification in December 2019. The publishing of the first SARS-CoV-2 genome made a valuable source of data to study the details about its phylogeny, evolution, and interaction with the host. Protein-protein binding assays have confirmed that Angiotensin-converting enzyme 2 (ACE2) is more likely to be the cell receptor via which the virus invades the host cell. In the present work, we provide an insight into the interaction of the viral spike Receptor Binding Domain (RBD) from different coronavirus isolates with host ACE2 protein. By calculating the binding energy between RBD and ACE2, we highlighted the putative jump in the affinity from a progenitor form of SARS-CoV-2 to the current virus responsible for COVID-19 outbreak. Our result was consistent with the phylogeny analysis and corroborates the opinion that the interface segment of the spike protein RBD might be acquired by SARS-CoV-2 via a complex evolutionary process rather than mutation accumulation. We also highlighted the relevance of Q493 and P499 amino acid residues of SARS-CoV-2 RBD for binding to hACE2 and maintaining the stability of the interface. Moreover, we show from the structural analysis that it is unlikely for the interface residues to be the result of human engineering. Finally, we studied the impact of eight different variants located at the interaction surface of ACE2, on the complex formation with SARS-CoV-2 RBD. We found that none of them is likely to disrupt the interaction with the viral RBD of SARS-CoV-2. ### Competing Interest Statement The authors have declared no competing interest.
1,310 downloads bioinformatics
Lucas von Chamier, Johanna Jukkala, Christoph Spahn, Martina Lerche, Sara Hernández-Pérez, Pieta K. Mattila, Eleni Karinou, Seamus Holden, Ahmet Can Solak, Alexander Krull, Tim-Oliver Buchholz, Florian Jug, Loic Royer, Mike Heilemann, Romain F. Laine, Guillaume Jacquemet, Ricardo Henriques
Deep Learning (DL) methods are increasingly recognised as powerful analytical tools for microscopy. Their potential to outperform conventional image processing pipelines is now well established. Despite the enthusiasm and innovations fuelled by DL technology, the need to access powerful and compatible resources, install multiple computational tools and modify code instructions to train neural networks all lead to an accessibility barrier that novice users often find difficult to cross. Here, we present ZeroCostDL4Mic, an entry-level teaching and deployment DL platform which considerably simplifies access and use of DL for microscopy. It is based on Google Colab which provides the free, cloud-based computational resources needed. ZeroCostDL4Mic allows researchers with little or no coding expertise to quickly test, train and use popular DL networks. In parallel, it guides researchers to acquire more knowledge, to experiment with optimising DL parameters and network architectures. We also highlight the limitations and requirements to use Google Colab. Altogether, ZeroCostDL4Mic accelerates the uptake of DL for new users and promotes their capacity to use increasingly complex DL networks.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!