Rxivist logo

Machine learning modeling of protein-intrinsic features predicts tractability of targeted protein degradation

By Wubing Zhang, Shourya S. Roy Burman, Jiaye Chen, Katherine A Donovan, Yang Cao, Boning Zhang, Zexian Zeng, Yi Zhang, Dian Li, Eric S Fischer, Collin Tokheim, Xiaole Shirley Liu

Posted 29 Sep 2021
bioRxiv DOI: 10.1101/2021.09.27.462040

Targeted protein degradation (TPD) has rapidly emerged as a therapeutic modality to eliminate previously undruggable proteins by repurposing the cell's endogenous protein degradation machinery. However, the susceptibility of proteins for targeting by TPD approaches, termed "degradability", is largely unknown. Recent systematic studies to map the degradable kinome have shown differences in degradation between kinases with similar drug-target engagement, suggesting yet unknown factors influencing degradability. We therefore developed a machine learning model, MAPD (Model-based Analysis of Protein Degradability), to predict degradability from protein features that encompass post-translational modifications, protein stability, protein expression and protein-protein interactions. MAPD shows accurate performance in predicting kinases that are degradable by TPD compounds (auPRC=0.759) and is likely generalizable to independent non-kinase proteins. We found five features with statistical significance to achieve optimal prediction, with ubiquitination potential being the most predictive. By structural modeling, we found that E2-accessible ubiquitination sites, but not lysine residues in general, are particularly associated with kinase degradability. Finally, we extended MAPD predictions to the entire proteome to find 964 disease-causing proteins, including 278 cancer genes, that may be tractable to TPD drug development.

Download data

  • Downloaded 1,276 times
  • Download rankings, all-time:
    • Site-wide: 19,758
    • In bioinformatics: 2,236
  • Year to date:
    • Site-wide: 3,745
  • Since beginning of last month:
    • Site-wide: 526

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide