Rxivist logo

Rxivist combines preprints from bioRxiv with data from Twitter to help you find the papers being discussed in your field. Currently indexing 70,610 bioRxiv papers from 308,227 authors.

Thousands of primer-free, high-quality, full-length SSU rRNA sequences from all domains of life

By S√łeren M. Karst, Morten Simonsen Dueholm, Simon Jon McIlroy, Rasmus H. Kirkegaard, Per H Nielsen, Mads Albertsen

Posted 22 Aug 2016
bioRxiv DOI: 10.1101/070771 (published DOI: 10.1038/nbt.4045)

Ribosomal RNA (rRNA) genes are the consensus marker for determination of microbial diversity on the planet, invaluable in studies of evolution and, for the past decade, high-throughput sequencing of variable regions of ribosomal RNA genes has become the backbone of most microbial ecology studies. However, the underlying reference databases of full-length rRNA gene sequences are underpopulated, ecosystem skewed, and subject to primer bias, which hamper our ability to study the true diversity of ecosystems. Here we present an approach that combines reverse transcription of full-length small subunit (SSU) rRNA genes and synthetic long read sequencing by molecular tagging, to generate primer-free, full-length SSU rRNA gene sequences from all domains of life, with a median raw error rate of 0.17%. We generated thousands of full-length SSU rRNA sequences from five well-studied ecosystems (soil, human gut, fresh water, anaerobic digestion, and activated sludge) and obtained sequences covering all domains of life and the majority of all described phyla. Interestingly, 30% of all bacterial operational taxonomic units were novel, compared to the SILVA database (less than 97% similarity). For the Eukaryotes, the novelty was even larger with 63% of all OTUs representing novel taxa. In addition, 15% of the 18S rRNA OTUs were highly novel sequences with less than 80% similarity to the databases. The generation of primer-free full-length SSU rRNA sequences enabled eco-system specific estimation of primer-bias and, especially for eukaryotes, showed a dramatic discrepancy between the in-silico evaluation and primer-free data generated in this study. The large amount of novel sequences obtained here reaffirms that there is still vast, untapped microbial diversity lacking representatives in the SSU rRNA databases and that there might be more than millions after all. With our new approach, it is possible to readily expand the rRNA databases by orders of magnitude within a short timeframe. This will, for the first time, enable a broad census of the tree of life.

Download data

  • Downloaded 3,374 times
  • Download rankings, all-time:
    • Site-wide: 990 out of 70,616
    • In ecology: 12 out of 3,034
  • Year to date:
    • Site-wide: 42,501 out of 70,616
  • Since beginning of last month:
    • Site-wide: 30,230 out of 70,616

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide


Sign up for the Rxivist weekly newsletter! (Click here for more details.)