Rxivist logo

High-throughput annotation of full-length long noncoding RNAs with Capture Long-Read Sequencing

By Julien Lagarde, Barbara Uszczynska-Ratajczak, Silvia Carbonell, SÍlvia Pérez-Lluch, Amaya Abad, Carrie Davis, Thomas Gingeras, Adam Frankish, Jennifer Harrow, Roderic Guigó, Rory Johnson

Posted 01 Feb 2017
bioRxiv DOI: 10.1101/105064 (published DOI: 10.1038/ng.3988)

Accurate annotations of genes and their transcripts is a foundation of genomics, but no annotation technique presently combines throughput and accuracy. As a result, reference gene collections remain incomplete: many gene models are fragmentary, while thousands more remain uncatalogued - particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), combining targeted RNA capture with third-generation long-read sequencing. We present an experimental re-annotation of the GENCODE intergenic lncRNA population in matched human and mouse tissues, resulting in novel transcript models for 3574 / 561 gene loci, respectively. CLS approximately doubles the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enable us to definitively characterize the genomic features of lncRNAs, including promoter- and gene-structure, and protein-coding potential. Thus CLS removes a longstanding bottleneck of transcriptome annotation, generating manual-quality full-length transcript models at high-throughput scales.

Download data

  • Downloaded 3,862 times
  • Download rankings, all-time:
    • Site-wide: 2,975
    • In genomics: 332
  • Year to date:
    • Site-wide: 42,729
  • Since beginning of last month:
    • Site-wide: 48,614

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide