Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 60,239 bioRxiv papers from 267,831 authors.

OLGA: fast computation of generation probabilities of B- and T-cell receptor amino acid sequences and motifs

By Zachary Sethna, Yuval Elhanati, Curtis G Callan, Aleksandra M Walczak, Thierry Mora

Posted 12 Jul 2018
bioRxiv DOI: 10.1101/367904 (published DOI: 10.1093/bioinformatics/btz035)

Motivation: High-throughput sequencing of large immune repertoires has enabled the development of methods to predict the probability of generation by V(D)J recombination of T- and B-cell receptors of any specific nucleotide sequence. These generation probabilities are very non-homogeneous, ranging over 20 orders of magnitude in real repertoires. Since the function of a receptor really depends on its protein sequence, it is important to be able to predict this probability of generation at the amino acid level. However, brute-force summation over all the nucleotide sequences with the correct amino acid translation is computationally intractable. The purpose of this paper is to present a solution to this problem. Results: We use dynamic programming to construct an efficient and flexible algorithm, called OLGA (Optimized Likelihood estimate of immunoGlobulin Amino-acid sequences), for calculating the probability of generating a given CDR3 amino acid sequence or motif, with or without V/J restriction, as a result of V(D)J recombination in B or T cells. We apply it to databases of epitope-specific T-cell receptors to evaluate the probability that a typical human subject will possess T cells responsive to specific disease-associated epitopes. The model prediction shows an excellent agreement with published data. We suggest that OLGA may be a useful tool to guide vaccine design. Availability: Source code is available at https://github.com/zsethna/OLGA

Download data

  • Downloaded 375 times
  • Download rankings, all-time:
    • Site-wide: 25,571 out of 60,239
    • In bioinformatics: 3,522 out of 6,078
  • Year to date:
    • Site-wide: 25,178 out of 60,239
  • Since beginning of last month:
    • Site-wide: 37,170 out of 60,239

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News