Rxivist logo

Predicting protein subcellular location using learned distributed representations from a protein-protein network

By Xiaoyong Pan, Lei Chen, Min Liu, Tao Huang, Yu-Dong Cai

Posted 15 Sep 2019
bioRxiv DOI: 10.1101/768739

Functions of proteins are in general related to their subcellular locations. To identify the functions of a protein, we first need know where this protein is located. Interacting proteins tend to locate in the same subcellular location. Thus, it is imperative to take the protein-protein interactions into account for computational identification of protein subcellular locations.In this study, we present a deep learning-based method, node2loc, to predict protein subcellular location. node2loc first learns distributed representations of proteins in a protein-protein network using node2vec, which acquires representations from unlabeled data for downstream tasks. Then the learned representations are further fed into a recurrent neural network (RNN) to predict subcellular locations. Considering the severe class imbalance of different subcellular locations, Synthetic Minority Over-sampling Technique (SMOTE) is applied to artificially boost subcellular locations with few proteins.We construct a benchmark dataset with 16 subcellular locations and evaluate node2loc on this dataset. node2loc yields a Matthews correlation coefficient (MCC) value of 0.812, which outperforms other baseline methods. The results demonstrate that the learned presentations from a protein-protein network have strong discriminate ability for classifying protein subcellular locations and the RNN is a more powerful classifier than traditional machine learning models.

Download data

  • Downloaded 329 times
  • Download rankings, all-time:
    • Site-wide: 79,678
    • In bioinformatics: 7,323
  • Year to date:
    • Site-wide: 89,026
  • Since beginning of last month:
    • Site-wide: 66,157

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)