Rxivist logo

Predicting hosts based on early SARS-CoV-2 samples and analyzing later world-wide pandemic in 2020

By Qian Guo, Mo Li, Chunhui Wang, Jinyuan Guo, Xiaoqing Jiang, Jie Tan, Shufang Wu, Peihong Wang, Tingting Xiao, Man Zhou, Zhencheng Fang, Yonghong Xiao, Huaiqiu Zhu

Posted 22 Mar 2021
bioRxiv DOI: 10.1101/2021.03.21.436312

The SARS-CoV-2 pandemic has raised the concern for identifying hosts of the virus since the early-stage outbreak. To address this problem, we proposed a deep learning method, DeepHoF, based on extracting the viral genomic features automatically, to predict host likelihood scores on five host types, including plant, germ, invertebrate, non-human vertebrate and human, for novel viruses. DeepHoF made up for the lack of an accurate tool applicable to any novel virus and overcame the limitation of the sequence similarity-based methods, reaching a satisfactory AUC of 0.987 on the five-classification. Additionally, to fill the gap in the efficient inference of host species for SARS-CoV-2 using existed tools, we conducted a deep analysis on the host likelihood profile calculated by DeepHoF. Using the isolates sequenced in the earliest stage of COVID-19, we inferred minks, bats, dogs and cats were potential hosts of SARS-CoV-2, while minks might be one of the most noteworthy hosts. Several genes of SARS-CoV-2 demonstrated their significance in determining the host range. Furthermore, the large-scale genome analysis, based on DeepHoF's computation for the later world-wide pandemic in 2020, disclosed the uniformity of host range among SARS-CoV-2 samples and the strong association of SARS-CoV-2 between humans and minks.

Download data

  • Downloaded 416 times
  • Download rankings, all-time:
    • Site-wide: 120,631
    • In bioinformatics: 9,782
  • Year to date:
    • Site-wide: 62,001
  • Since beginning of last month:
    • Site-wide: 177,032

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide