A high resolution single molecule sequencing-based Arabidopsis transcriptome using novel methods of Iso-seq analysis
Cristiane P.G. Calixto,
Juan Carlos Entizne,
Asa ben Hur,
Michael F. Jantsch,
Qingshun Quinn Li,
Anireddy S.N. Reddy,
John WS Brown
Posted 03 Sep 2021
bioRxiv DOI: 10.1101/2021.09.02.458763
Posted 03 Sep 2021
BackgroundAccurate and comprehensive annotation of transcript sequences is essential for transcript quantification and differential gene and transcript expression analysis. Single molecule long read sequencing technologies provide improved integrity of transcript structures including alternative splicing, and transcription start and polyadenylation sites. However, accuracy is significantly affected by sequencing errors, mRNA degradation or incomplete cDNA synthesis. ResultsWe present a new and comprehensive Arabidopsis thaliana Reference Transcript Dataset 3 (AtRTD3). AtRTD3 contains over 160k transcripts - twice that of the best current Arabidopsis transcriptome and including over 1,500 novel genes. 79% of transcripts are from Iso-seq with accurately defined splice junctions and transcription start and end sites. We developed novel methods to determine splice junctions and transcription start and end sites accurately. Mis-match profiles around splice junctions provided a powerful feature to distinguish correct splice junctions and remove false splice junctions. Stratified approaches identified high confidence transcription start/end sites and removed fragmentary transcripts due to degradation. AtRTD3 is a major improvement over existing transcriptomes as demonstrated by analysis of an Arabidopsis cold response RNA-seq time-series. AtRTD3 provided higher resolution of transcript expression profiling and identified cold- and light-induced differential transcription start and polyadenylation site usage. ConclusionsAtRTD3 is the most comprehensive Arabidopsis transcriptome currently available. It improves the precision of differential gene and transcript expression, differential alternative splicing, and transcription start/end site usage from RNA-seq data. The novel methods for identifying accurate splice junctions and transcription start/end sites are widely applicable and will improve single molecule sequencing analysis from any species.
- Downloaded 498 times
- Download rankings, all-time:
- Site-wide: 62,201
- In plant biology: 1,513
- Year to date:
- Site-wide: 9,040
- Since beginning of last month:
- Site-wide: 243
Distribution of downloads per paper, site-wide
- 27 Nov 2020: The website and API now include results pulled from medRxiv as well as bioRxiv.
- 18 Dec 2019: We're pleased to announce PanLingua, a new tool that enables you to search for machine-translated bioRxiv preprints using more than 100 different languages.
- 21 May 2019: PLOS Biology has published a community page about Rxivist.org and its design.
- 10 May 2019: The paper analyzing the Rxivist dataset has been published at eLife.
- 1 Mar 2019: We now have summary statistics about bioRxiv downloads and submissions.
- 8 Feb 2019: Data from Altmetric is now available on the Rxivist details page for every preprint. Look for the "donut" under the download metrics.
- 30 Jan 2019: preLights has featured the Rxivist preprint and written about our findings.
- 22 Jan 2019: Nature just published an article about Rxivist and our data.
- 13 Jan 2019: The Rxivist preprint is live!