Rxivist logo

Using Natural Language Processing to Learn the Grammar of Glycans

By Daniel Bojar, Diogo M. Camacho, James J. Collins

Posted 11 Jan 2020
bioRxiv DOI: 10.1101/2020.01.10.902114

While nucleic acids and proteins receive ample attention, progress on understanding the structural and functional roles of carbohydrates has lagged behind. Here, we develop a language model for glycans, SweetTalk, taking into account glycan connectivity and composition. We use this model to investigate motifs in glycan substructures, classify them according to their O-/N-linkage, and predict their immunogenicity with an accuracy of ~92%, opening up the potential for rational glycoengineering.

Download data

  • Downloaded 1,257 times
  • Download rankings, all-time:
    • Site-wide: 7,876 out of 89,328
    • In bioinformatics: 1,359 out of 8,426
  • Year to date:
    • Site-wide: 1,035 out of 89,328
  • Since beginning of last month:
    • Site-wide: 9,741 out of 89,328

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)