Rxivist logo

Scalable and Accurate Drug–target Prediction Based on Heterogeneous Bio-linked Network Mining

By Nansu Zong, Rachael Sze Nga Wong, Victoria Ngo, Yue Yu, Ning Li

Posted 03 Feb 2019
bioRxiv DOI: 10.1101/539643

Motivation: Despite the existing classification and inference based machine learning methods that show promising results in drug target prediction, these methods possess inevitable limitations, where: 1) results are often biased as it lacks negative samples in the classification based methods, and 2) novel drug target associations with new (or isolated) drugs/targets cannot be explored by inference based methods. As big data continues to boom, there is a need to study a scalable, robust, and accurate solution that can process large heterogeneous datasets and yield valuable predictions. Results: We introduce a drug target prediction method that improved our previously proposed method from the three aspects: 1) we constructed a heterogeneous network which incorporates 12 repositories and includes 7 types of biomedical entities (#20,119 entities, #194,296 associations), 2) we enhanced the feature learning method with Node2Vec, a scalable state of the art feature learning method, 3) we integrate the originally proposed inference-based model with a classification model, which is further finetuned by a negative sample selection algorithm. The proposed method shows a better result for drug target association prediction: 95.3% AUC ROC score compared to the existing methods in the 10-fold cross-validation tests. We studied the biased learning/testing in the network-based pairwise prediction, and conclude a best training strategy. Finally, we conducted a disease specific prediction task based on 20 diseases. New drug-target associations were successfully predicted with AUC ROC in average, 97.2% (validated based on the DrugBank 5.1.0). The experiments showed the reliability of the proposed method in predicting novel drug-target associations for the disease treatment.

Download data

  • Downloaded 545 times
  • Download rankings, all-time:
    • Site-wide: 42,827
    • In bioinformatics: 4,617
  • Year to date:
    • Site-wide: 104,674
  • Since beginning of last month:
    • Site-wide: 66,175

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News