Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 73,081 bioRxiv papers from 318,277 authors.

OLGA: fast computation of generation probabilities of B- and T-cell receptor amino acid sequences and motifs

By Zachary Sethna, Yuval Elhanati, Curtis G Callan, Aleksandra M Walczak, Thierry Mora

Posted 12 Jul 2018
bioRxiv DOI: 10.1101/367904 (published DOI: 10.1093/bioinformatics/btz035)

Motivation: High-throughput sequencing of large immune repertoires has enabled the development of methods to predict the probability of generation by V(D)J recombination of T- and B-cell receptors of any specific nucleotide sequence. These generation probabilities are very non-homogeneous, ranging over 20 orders of magnitude in real repertoires. Since the function of a receptor really depends on its protein sequence, it is important to be able to predict this probability of generation at the amino acid level. However, brute-force summation over all the nucleotide sequences with the correct amino acid translation is computationally intractable. The purpose of this paper is to present a solution to this problem. Results: We use dynamic programming to construct an efficient and flexible algorithm, called OLGA (Optimized Likelihood estimate of immunoGlobulin Amino-acid sequences), for calculating the probability of generating a given CDR3 amino acid sequence or motif, with or without V/J restriction, as a result of V(D)J recombination in B or T cells. We apply it to databases of epitope-specific T-cell receptors to evaluate the probability that a typical human subject will possess T cells responsive to specific disease-associated epitopes. The model prediction shows an excellent agreement with published data. We suggest that OLGA may be a useful tool to guide vaccine design. Availability: Source code is available at https://github.com/zsethna/OLGA

Download data

  • Downloaded 431 times
  • Download rankings, all-time:
    • Site-wide: 28,428 out of 73,086
    • In bioinformatics: 3,768 out of 7,136
  • Year to date:
    • Site-wide: 30,820 out of 73,086
  • Since beginning of last month:
    • Site-wide: 30,820 out of 73,086

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News