Rxivist logo

Identification of viruses with the potential to infect human

By Zheng Zhang, Zena Cai, Zhiying Tan, Congyu Lu, Gaihua Zhang, Yousong Peng

Posted 05 Apr 2019
bioRxiv DOI: 10.1101/597963

The virus has caused much mortality and morbidity to humans, and still posed a serious threat to the global public health. The virome with the human-infection potential is far from complete. Novel viruses have been discovered at an unprecedented pace as the rapid development of viral metagenomics. However, there is still a lack of a method for rapidly identifying the virus with the human-infection potential. This study built several machine learning models for discriminating the human-infecting viruses from other viruses based on the frequency of k-mers in the viral genomic sequences. The k-nearest neighbor (KNN) model could predict the human-infecting virus with an accuracy of over 90%. Even for the KNN models built on the contigs as short as 1kb, they performed comparably to those built on the viral genomes, suggesting that the models could be used to identify the human-infecting virus from the viral metagenomic sequences. This work could help for discovery of novel human-infecting virus in metagenomics studies.

Download data

  • Downloaded 268 times
  • Download rankings, all-time:
    • Site-wide: 63,421 out of 100,488
    • In bioinformatics: 6,837 out of 9,232
  • Year to date:
    • Site-wide: 43,409 out of 100,488
  • Since beginning of last month:
    • Site-wide: None out of 100,488

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


  • 20 Oct 2020: Support for sorting preprints using Twitter activity has been removed, at least temporarily, until a new source of social media activity data becomes available.
  • 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
  • 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
  • 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
  • 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
  • 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
  • 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
  • 22 Jan 2019: Nature just published an article about Rxivist and our data.
  • 13 Jan 2019: The Rxivist preprint is live!