Rxivist logo

Defining the Characteristics of Type I Interferon Stimulated Genes: Insight from Expression Data and Machine Learning

By Haiting Chai, Joseph Hughes, Quan Gu, David L Robertson

Posted 09 Oct 2021
bioRxiv DOI: 10.1101/2021.10.08.463622

A virus-infected cell triggers a signalling cascade resulting in the secretion of interferons (IFNs), which in turn induce the up-regulation of IFN-stimulated genes (ISGs) that play an important role in the inhibition of the viral infection and the return to cellular homeostasis. Here, we conduct detailed analyses on 7443 features relating to evolutionary conservation, nucleotide composition, gene expression, amino acid composition, and network properties to elucidate factors associated with the stimulation of genes in response to type I IFNs. Our results show that ISGs are less evolutionary conserved than genes that are not significantly stimulated in IFN experiments (non-ISGs). ISGs show significant depletion of GC-content in the coding region of their canonical transcripts, which leads to under-representation in the nucleotide compositions. Differences between ISGs and non-ISGs are also reflected in the properties of their coded amino acid sequence compositions. Network analyses show that ISG products tend to be involved in key paths but are away from hubs or bottlenecks of the human protein-protein interaction (PPI) network. Our analyses also show that interferon-repressed human genes (IRGs), which are down-regulated in the presence of IFNs, can have similar properties to ISGs, thus leading to false positives in ISG predictions. Based on these analyses, we design a machine learning framework integrating the usage of support vector machine (SVM) and feature selection algorithms. The ISG prediction achieves an area under the receiver operating characteristic curve (AUC) of 0.7455 and demonstrates the similarity between ISGs triggered by type I and III IFNs. Our machine learning model predicts a number of genes as potential ISGs that so far have shown no significant differential expression when stimulated with IFN in the cell types and tissue types compiled in the available IFN-related databases. A webserver implementing our method is accessible at http://isgpre.cvr.gla.ac.uk/ .

Download data

  • Downloaded 174 times
  • Download rankings, all-time:
    • Site-wide: 152,608
    • In immunology: 4,664
  • Year to date:
    • Site-wide: None
  • Since beginning of last month:
    • Site-wide: 47,986

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide