Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 59,974 bioRxiv papers from 266,630 authors.

High throughput RNA sequencing technologies have provided invaluable research opportunities across distinct scientific domains by producing quantitative readouts of the transcriptional activity of both entire cellular populations and single cells. The majority of RNA-Seq analyses begin by mapping each experimentally produced sequence (i.e., read) to a set of annotated reference sequences for the organism of interest. For both biological and technical reasons, a significant fraction of reads remains unmapped. In this work, we develop Read Origin Protocol (ROP) to discover the source of all reads originating from complex RNA molecules, recombinant T and B cell receptors, and microbial communities. We applied ROP to 8,641 samples across 630 individuals from 54 tissues. A fraction of RNA-Seq data (n=86) was obtained in-house; the remaining data was obtained from the Genotype-Tissue Expression (GTEx v6) project. To generalize the reported number of accounted reads, we also performed ROP analysis on thousands of different, randomly selected, and publicly available RNA-Seq samples in the Sequence Read Archive (SRA). Our approach can account for 99.9% of 1 trillion reads of various read length across the merged dataset (n=10641). Using in-house RNA-Seq data, we show that immune profiles of asthmatic individuals are significantly different from the profiles of control individuals, with decreased average per sample T and B cell receptor diversity. We also show that immune diversity is inversely correlated with microbial load. Our results demonstrate the potential of ROP to exploit unmapped reads in order to better understand the functional mechanisms underlying connections between the immune system, microbiome, human gene expression, and disease etiology. ROP is freely available at https://github.com/smangul1/rop and currently supports human and mouse RNA-Seq reads.

Download data

  • Downloaded 2,515 times
  • Download rankings, all-time:
    • Site-wide: 1,410 out of 59,974
    • In genomics: 330 out of 4,168
  • Year to date:
    • Site-wide: 34,653 out of 59,974
  • Since beginning of last month:
    • Site-wide: 29,416 out of 59,974

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News