A Common Class of Transcripts with 5'-Intron Depletion, Distinct Early Coding Sequence Features, and N1-Methyladenosine Modification
Hon Nian Chua,
Michael P. Snyder,
Alexander F Palazzo,
Melissa J Moore,
Frederick P Roth
Posted 07 Jun 2016
bioRxiv DOI: 10.1101/057455 (published DOI: 10.1261/rna.059105.116)
Posted 07 Jun 2016
Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5' proximal-intron-minus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the Exon Junction Complex (EJC) at non-canonical 5' proximal positions. Finally, N1-methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ~20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N1-methyladenosines in the early coding region, and enrichment for non-canonical binding by the Exon Junction Complex.
- Downloaded 563 times
- Download rankings, all-time:
- Site-wide: 26,726 out of 88,646
- In bioinformatics: 3,623 out of 8,385
- Year to date:
- Site-wide: 74,309 out of 88,646
- Since beginning of last month:
- Site-wide: 56,298 out of 88,646
Downloads over time
Distribution of downloads per paper, site-wide
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!