Rxivist logo

PhyteByte: Identification of foods containing compounds with specific pharmacological properties

By Kenneth E Westerman, Sean Harrington, José M. Ordovás, Laurence D. Parnell

Posted 11 Jan 2020
bioRxiv DOI: 10.1101/2020.01.10.902197 (published DOI: 10.1186/s12859-020-03582-7)

Background Phytochemicals and other molecules in foods elicit positive health benefits, often by poorly established or unknown mechanisms. While there is a wealth of data on the biological and biophysical properties of drugs and therapeutic compounds, there is a notable lack of similar data for compounds commonly present in food. Computational methods for high-throughput identification of food compounds with specific biological effects, especially when accompanied by relevant food composition data, could enable more effective and more personalized dietary planning. We have created a machine learning-based tool (PhyteByte) to leverage existing pharmacological data to predict bioactivity across a comprehensive molecular database of foods and food compounds. Results PhyteByte uses a cheminformatic approach to structure-based activity prediction and applies it to uncover the putative bioactivity of food compounds. The tool takes an input protein target and develops a random forest classifier to predict the effect of an input molecule based on its molecular fingerprint, using structure and activity data available from the ChEMBL database. It then predicts the relevant bioactivity of a library of food compounds with known molecular structures from the FooDB database. The output is a list of food compounds with high confidence of eliciting relevant biological effects, along with their source foods and associated quantities in those foods, where available. Applying PhyteByte to the PPARG gene, we identified irigenin, sesamin, fargesin, and delta-sanshool as putative agonists of PPARG, along with previously identified agonists of this important metabolic regulator. Conclusions PhyteByte identifies food-based compounds that are predicted to interact with specific protein targets. The identified relationships can be used to prioritize food compounds for experimental or epidemiological follow-up and can contribute to the rapid development of precision approaches to new nutraceuticals as well as personalized dietary planning. * EC50 : effective concentration IC50 : inhibitory concentration PPARG : peroxisome proliferator activated receptor gamma QSAR : quantitative structure activity relationship SMILES : simplified molecular-input line-entry system TZD : thiazolidinedione USDA : United States Department of Agriculture

Download data

  • Downloaded 339 times
  • Download rankings, all-time:
    • Site-wide: 98,397
    • In bioinformatics: 8,503
  • Year to date:
    • Site-wide: 135,796
  • Since beginning of last month:
    • Site-wide: 151,820

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide