Rxivist logo

Decoding microbiome and protein family linkage to improve protein structure prediction

By Pengshuo Yang, Wei Zheng, Kang Ning, Yang Zhang

Posted 16 Apr 2021
bioRxiv DOI: 10.1101/2021.04.15.440088

Information extracted from microbiome sequences through deep-learning techniques can significantly improve protein structure and function modeling. However, the model training and metagenome search were largely blind with low efficiency. Built on 4.25 billion microbiome sequences from four major biomes (Gut, Lake, Soil and Fermentor), we proposed a MetaSource model to decode the inherent link of microbial niches with protein homologous families. Large-scale protein family folding experiments showed that a targeted approach using predicted biomes significantly outperform combined metagenome datasets in both speed of MSA collection and accuracy of deep-learning structure assembly. These results revealed the important link of biomes with protein families and provided a useful bluebook to guide future microbiome sequence database and modeling development for protein structure and function prediction.

Download data

  • Downloaded 339 times
  • Download rankings, all-time:
    • Site-wide: 142,823
    • In bioinformatics: 11,156
  • Year to date:
    • Site-wide: 99,502
  • Since beginning of last month:
    • Site-wide: 167,081

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide