Rxivist logo

Minerva: An Alignment and Reference Free Approach to Deconvolve Linked-Reads for Metagenomics

By David C Danko, Dmitry Meleshko, Daniela Bezdan, Christopher Mason, Iman Hajirasouliha

Posted 10 Nov 2017
bioRxiv DOI: 10.1101/217869 (published DOI: 10.1101/gr.235499.118)

Emerging linked-read technologies (aka read-cloud or barcoded short-reads) have revived interest in standard short-read technology as a viable way to understand large scale structure in genomes and metagenomes. Linked-read technologies, such as the 10X Chromium system, use a microfluidic system and a set of specially designed 3 prime barcodes (aka UIDs) to tag short DNA reads which were originally sourced from the same long fragment of DNA; subsequently these specially barcoded reads are sequenced on standard short read platforms. This approach results in interesting compromises. Each long fragment of DNA is covered only sparsely by short reads, no information about the relative ordering of reads from the same fragment is preserved, and typically each 3 prime barcode matches reads from 5-20 long fragments of DNA. However, the cost per base to sequence is far lower than single molecule long read sequencing systems, far less input DNA is required, and the error rate is that of standard short-reads. Linked-reads represent a new set of algorithmic challenges. In this paper we formally describe one particular issue common to all applications of linked-read technology: the deconvolution of reads with a single barcode into clusters that correspond to a single long fragment of DNA. We introduce Minerva, A graph-based algorithm which approximately solves the barcode deconvolution problem for metagenomic data (where reference genomes may be incomplete or unavailable). Additionally, we demonstrate that deconvolved barcoded reads significantly improve downstream results by improving the specificity of taxonomic assignments, and by improving the ability of topic models to identify clusters of related sequences.

Download data

  • Downloaded 1,367 times
  • Download rankings, all-time:
    • Site-wide: 11,354
    • In bioinformatics: 1,438
  • Year to date:
    • Site-wide: 17,623
  • Since beginning of last month:
    • Site-wide: 75,760

Altmetric data


Downloads over time

Distribution of downloads per paper, site-wide


PanLingua

Sign up for the Rxivist weekly newsletter! (Click here for more details.)


News