Network-based prediction of protein interactions
István A. Kovács,
Michael A. Calderwood,
Posted 02 Mar 2018
bioRxiv DOI: 10.1101/275529 (published DOI: 10.1038/s41467-019-09177-y)
Posted 02 Mar 2018
As biological function emerges through interactions between a cell's molecular constituents, understanding cellular mechanisms requires us to catalogue all physical interactions between proteins. Despite spectacular advances in high-throughput mapping, the number of missing human protein-protein interactions (PPIs) continues to exceed the experimentally documented interactions. Computational tools that exploit structural, sequence or network topology information are increasingly used to fill in the gap, using the patterns of the already known interactome to predict undetected, yet biologically relevant interactions. Such network-based link prediction tools rely on the Triadic Closure Principle (TCP), stating that two proteins likely interact if they share multiple interaction partners. TCP is rooted in social network analysis, namely the observation that the more common friends two individuals have, the more likely that they know each other. Here, we offer direct empirical evidence across multiple datasets and organisms that, despite its dominant use in biological link prediction, TCP is not valid for most protein pairs. We show that this failure is fundamental - TCP violates both structural constraints and evolutionary processes. This understanding allows us to propose a link prediction principle, consistent with both structural and evolutionary arguments, that predicts yet uncovered protein interactions based on paths of length three (L3). A systematic computational cross-validation shows that the L3 principle significantly outperforms existing link prediction methods. To experimentally test the L3 predictions, we perform both large-scale high-throughput and pairwise tests, finding that the predicted links test positively at the same rate as previously known interactions, suggesting that most (if not all) predicted interactions are real. Combining L3 predictions with experimental tests provided new interaction partners of FAM161A, a protein linked to retinitis pigmentosa, offering novel insights into the molecular mechanisms that lead to the disease. Because L3 is rooted in a fundamental biological principle, we expect it to have a broad applicability, enabling us to better understand the emergence of biological function under both healthy and pathological conditions.
- Downloaded 2,091 times
- Download rankings, all-time:
- Site-wide: 3,397 out of 89,238
- In systems biology: 86 out of 2,308
- Year to date:
- Site-wide: 26,634 out of 89,238
- Since beginning of last month:
- Site-wide: 24,232 out of 89,238
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!