PhyteByte: Identification of foods containing compounds with specific pharmacological properties
Background Phytochemicals and other molecules in foods elicit positive health benefits, often by poorly established or unknown mechanisms. While there is a wealth of data on the biological and biophysical properties of drugs and therapeutic compounds, there is a notable lack of similar data for compounds commonly present in food. Computational methods for high-throughput identification of food compounds with specific biological effects, especially when accompanied by relevant food composition data, could enable more effective and more personalized dietary planning. We have created a machine learning-based tool (PhyteByte) to leverage existing pharmacological data to predict bioactivity across a comprehensive molecular database of foods and food compounds. Results PhyteByte uses a cheminformatic approach to structure-based activity prediction and applies it to uncover the putative bioactivity of food compounds. The tool takes an input protein target and develops a random forest classifier to predict the effect of an input molecule based on its molecular fingerprint, using structure and activity data available from the ChEMBL database. It then predicts the relevant bioactivity of a library of food compounds with known molecular structures from the FooDB database. The output is a list of food compounds with high confidence of eliciting relevant biological effects, along with their source foods and associated quantities in those foods, where available. Applying PhyteByte to the PPARG gene, we identified irigenin, sesamin, fargesin, and delta-sanshool as putative agonists of PPARG, along with previously identified agonists of this important metabolic regulator. Conclusions PhyteByte identifies food-based compounds that are predicted to interact with specific protein targets. The identified relationships can be used to prioritize food compounds for experimental or epidemiological follow-up and can contribute to the rapid development of precision approaches to new nutraceuticals as well as personalized dietary planning. * EC50 : effective concentration IC50 : inhibitory concentration PPARG : peroxisome proliferator activated receptor gamma QSAR : quantitative structure activity relationship SMILES : simplified molecular-input line-entry system TZD : thiazolidinedione USDA : United States Department of Agriculture
- Downloaded 325 times
- Download rankings, all-time:
- Site-wide: 90,667
- In bioinformatics: 8,014
- Year to date:
- Site-wide: 117,320
- Since beginning of last month:
- Site-wide: 124,335
Downloads over time
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!